Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What Makes Something "Beneficial"?

0 views
Skip to first unread message

Sean Pitman

unread,
Dec 22, 2003, 5:05:48 PM12/22/03
to
From: Chris Merli (clm...@insightbb.com)
Subject: Re: Bill Roger's question for Sean Pitman
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=CQsvb.203225%24275.755591%40attbi_s53&rnum=5
Date: 2003-11-21 10:31:29 PST
>
> If we simply gave a definition to each three letter
> word then they would all have meaning. Perhaps
> more directly are the three letter word that have
> meaning only that due to our defining them as
> having meaning. In a similar way you have
> dismissed most of the protein sequences as useless
> but given the proper context couldn't every protein
> have some function? I guess this could be easily
> disproved if you simply showed one that has no
> function in any context.

The context of protein meaning/function is created by the individual
organism as it interacts with its particular environment. It really
doesn't matter if a particular protein might have some beneficial
meaning/function in a different organism/environment. The question
is, does it have some sort of beneficial function where it is right
now in the particular organism in which is might be found? Nature
cannot select to keep a particular protein in a particular gene pool
just because it may have some sort of beneficial function elsewhere.
It really doesn't matter what its function may or may not be
elsewhere. Nature only looks at what works right now in a particular
creature. Nature cannot plan ahead or say to itself, "Hey, this
protein would work great if only it was in a different creature or a
different place in the genome." If it is not working right now where
it is, nature simply will not select to keep it.

The fact of the matter is that from the perspective of a particular
organism the vast majority of possible proteins do not have a
beneficial function. This ratio of beneficial vs. non-beneficial is
much higher for those functions that require relatively small proteins
(20 or 30aa), but it decreases in an exponential manner for each
additional fairly specified amino acid that is require for minimum
function of a particular type to be realized. After just a few
hundred fairly specified amino acids, the density of beneficial vs.
non-beneficial sequences becomes so miniscule that the mindless
processes of evolutionary change simply cannot find new sequences with
new types of functions very easily. At the level of just a few
thousand fairly specified amino acids evolutionary processes stall out
completely this side of trillions upon trillions of years. Examples
of functions that require such levels of specified complexity include
bacterial motility systems, such as the flagellar apparatus, which
requires at least 20 to 30 different types of proteins totaling well
over 5,000 amino acids working together at the same time for the
function of flagellar motility to be realized. Evolutionary processes
simply cannot evolve new types of functions at such levels of minimum
specified complexity in what anyone would call a reasonable amount of
time.

Sean
www.naturalselection.0catch.com

Chris Merli

unread,
Dec 22, 2003, 5:44:47 PM12/22/03
to

"Sean Pitman" <seanpi...@naturalselection.0catch.com> wrote in message
news:80d0c26f.03122...@posting.google.com...

I think this simply supports my point. You dismiss proteins as useless
because they are not functional in the current individual at the present
moment. There is no reason to suspect they were useless in a recent
ancestor under different conditions. In order to correctly identify
proteins that are useful you would have to identify all active sites (notice
whole proteins are not the issue simply active sites) that were useful not
only in a current organism but in all possible ancestors. A protein that
was useful in an ancient species and is not significantly altered b may
easily be recruited in a new evolutionary event.


>
> The fact of the matter is that from the perspective of a particular
> organism the vast majority of possible proteins do not have a
> beneficial function. This ratio of beneficial vs. non-beneficial is
> much higher for those functions that require relatively small proteins
> (20 or 30aa), but it decreases in an exponential manner for each
> additional fairly specified amino acid that is require for minimum
> function of a particular type to be realized. After just a few
> hundred fairly specified amino acids, the density of beneficial vs.
> non-beneficial sequences becomes so miniscule that the mindless
> processes of evolutionary change simply cannot find new sequences with
> new types of functions very easily. At the level of just a few
> thousand fairly specified amino acids evolutionary processes stall out
> completely this side of trillions upon trillions of years.

The trouble, as was pointed out elsewhere, is that proteins are not one
massive single function entity but a series of smaller active sites. To
make matter worse usually these active sites require only a small number of
specific amino acids.

Examples
> of functions that require such levels of specified complexity include
> bacterial motility systems, such as the flagellar apparatus, which
> requires at least 20 to 30 different types of proteins totaling well
> over 5,000 amino acids working together at the same time for the
> function of flagellar motility to be realized.

I am afraid only a small fraction of these 5,000 amino acids are really
critical to the function of the proteins. This seems to be the fundemental
misunderstanding that is driving the neutral gaps idea.

RobinGoodfellow

unread,
Dec 23, 2003, 1:29:00 AM12/23/03
to
seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.03122...@posting.google.com>...

This is correct, but it cuts both ways, as Chris already pointed out.
Just because something is not beneficial at time X under a certain set
of conditions does not mean it would not be beneficial - and thus
selected for - at time Y under a different set of conditions.

Take your lactase argument, for example. You claim that the lactase
function must be very rare because we have never seen lactase evolve
in the wild in "billions of bacterial generations". (I'd love to see
how you've arrived at this figure, by the way.) However, as others
have pointed out to you, the ability to process lactose just wouldn't
be all that beneficial to most bacteria, since an overwhelming
majority of them do not live in lactose-rich envoronments. Even for
those bacteria that find themselves in such environments, evolving a
rudimentary lactase function will not necessarily be beneficial, since
they will be competing with bacteria that are already highly efficient
at catalyzing lactose. That is, currently in the wild, there isn't
much of a niche for evolving the lactase function. When such a niche
does appear, as happenned in Hall's experiment, or in the wild with
the nylonase enzyme, evolution eventually will find a rudimentary
solution capable of filling that niche. If it is sufficiently useful
for the organism, this solution can then be refined, by incremental
mutations, to something better suited for performing the new function.
Such mutations are not reversible - since a reverse mutation will
result in worse performance and will be selected out. This is how
specificity of sequence to function evolves. The notion of a function
*requiring* a specific sequence is complete bollocks unless you can
demonstrate, clearly and unambigously, that the function can only be
carried out by a molecular system obeying some highly specific
physical and chemical constraints. For instance, in case of the
lactase function, you would need to demonstrate that only a handful of
protein folds are suitable for catalyzing lactose, and that the
proteins must be composed of some very specific amino acids to
maintain these rare folds. So far, you've done a fine job of
asserting that such functions exist, but provided no evidence that
they do. In fact, if our resident biochemist sweetness is to be
believed, there is no black art involved in hydrolising lactose, and a
rudimentary molecule capable of doing so should be easy to obtain.

> The fact of the matter is that from the perspective of a particular
> organism the vast majority of possible proteins do not have a
> beneficial function. This ratio of beneficial vs. non-beneficial is
> much higher for those functions that require relatively small proteins
> (20 or 30aa), but it decreases in an exponential manner for each
> additional fairly specified amino acid that is require for minimum
> function of a particular type to be realized.

Again, this is only valid once you show that only a certain (long)
sequence of amino acids is *absolutely required* for the function.
Which you haven't.

But let us dispense with the silly notion of "function" altogether,
shall we? After all, you and I both know that evolution is
non-teleological: that is, from the evolutionary perspective, the it
is no more the function of the flagellum to grant motion to bacteria
than it is the function of a tornado to spin in circles and toss cows.
Your claim is grander: you claim that evolution cannot produce any
system of a certain complexity, as measured by, say, the number of
amino intracting amino acids in the system. So, let me ask you this:
of all possible 5000 amino acid sequences, how on earth does you or
anyone determine which ones are and which ones aren't "beneficial",
especially given ever-changing contexts? After, all this is crucial
for your "ratio of beneficial sequences argument" (which is actually
irrelevant anyway, as I'll explain below).

> After just a few
> hundred fairly specified amino acids, the density of beneficial vs.
> non-beneficial sequences becomes so miniscule that the mindless
> processes of evolutionary change simply cannot find new sequences with
> new types of functions very easily.

Aha! You are equivalencing "density" and "ratio", which is simply
wrong. If one were to adapt your model of sequence space (and one
really shouldn't!), the ratio would tell you about the relative number
of "beneficial" sequences, but nothing about their *density* - i.e.
distribution throughout the sequence space. That is, the relative
number of sequences could be tiny, but they could be either all
bunched together (very dense), or be separate, equally spaced-out
specks in the space (very sparse), or could be bunched into clusters
that are connected with one another in any way imaginable, to be
consistent with just about any hypothesis about the expected time that
it would take a random walk to get from one cluster to another.
Again, a random walk over a frozen n-dimensional sequence space is a
terrible model for evolution, but even if it were right, your
"statistics" would still be meaningless for the above reason. In
fact, your probability calculation only applies if evolution worked by
sampling entire n-amino acid sequences at random from this space -
which is exactly why your critics accuse you of thinking that
evolution operates this way.

> At the level of just a few
> thousand fairly specified amino acids evolutionary processes stall out
> completely this side of trillions upon trillions of years. Examples
> of functions that require such levels of specified complexity include
> bacterial motility systems, such as the flagellar apparatus, which
> requires at least 20 to 30 different types of proteins totaling well
> over 5,000 amino acids working together at the same time for the
> function of flagellar motility to be realized.

Really? When people give you examples of large proteins evolving, you
are quick to point out that many of the amino acids in those proteins
are irrelevant, and the number of actual "specified" amino acids is
very small. So what on earth makes you think that all the amino
acids, in all the proteins involved in flagellar assembly are relevant
and invariate?

> Evolutionary processes
> simply cannot evolve new types of functions at such levels of minimum
> specified complexity in what anyone would call a reasonable amount of
> time.

I agree. A few hundred million years is definitely unreasonable,
especially if you want the whole process reproduced in the lab in the
blink of an eye.

> Sean
> www.naturalselection.0catch.com

Cheers,
Robin.

howard hershey

unread,
Dec 24, 2003, 2:57:30 PM12/24/03
to

Sean, to make a long story short, the ratio of beneficial versus
non-beneficial you give is utterly irrelevant to anything. The
denominator of your ratio is always *total sequence space* based on
taking 1/20 to the power of the total number of amino acids (or minimum
number, which, in those cases you want to be unevolvable, is always the
same as total number of amino acids). That denominator is utterly
irrelevant *unless* your model is that every new functional protein
arises by a random walk from a random protein sequence. The *real*
model of evolution assumes a quite different mechanism, the modification
of a pre-existing protein (or duplicate thereof) in a specific organism.

The sequence space that is *relevant* to evolutionary mechanism is the
sequence space encoded by the proteins in that organism. That is, the
only relevant question is if, in *this* non-random sequence space, there
is a sequence x number of changes away from a selectable functionality
required by an environmental change. The sequence space in any given
organism is most certainly not *even* a random sample of total sequence
space. Far from it. Almost every one of the proteins has some
functional utility already. That alone makes your "ratio", based as it
is on a denominator of *total sequence space*, GIGO.

> After just a few
> hundred fairly specified amino acids, the density of beneficial vs.
> non-beneficial sequences becomes so miniscule that the mindless
> processes of evolutionary change simply cannot find new sequences with
> new types of functions very easily. At the level of just a few
> thousand fairly specified amino acids evolutionary processes stall out
> completely this side of trillions upon trillions of years.

Only if one were ignorant enough to use a denominator of *total sequence
space* that requires evolution to search through total sequence space
for functional sequences. No model of evolution does that.

> Examples
> of functions that require such levels of specified complexity include
> bacterial motility systems, such as the flagellar apparatus, which
> requires at least 20 to 30 different types of proteins totaling well
> over 5,000 amino acids working together at the same time for the
> function of flagellar motility to be realized.

And Sean has not *even* justified the number of 5,000.

RobinGoodfellow

unread,
Dec 24, 2003, 6:36:18 PM12/24/03
to
howard hershey wrote:

[snip]


>
> Sean, to make a long story short, the ratio of beneficial versus
> non-beneficial you give is utterly irrelevant to anything. The
> denominator of your ratio is always *total sequence space* based on
> taking 1/20 to the power of the total number of amino acids (or minimum
> number, which, in those cases you want to be unevolvable, is always the
> same as total number of amino acids). That denominator is utterly
> irrelevant *unless* your model is that every new functional protein
> arises by a random walk from a random protein sequence. The *real*
> model of evolution assumes a quite different mechanism, the modification
> of a pre-existing protein (or duplicate thereof) in a specific organism.

It is even worse than that. Even random walks starting at random points
in N-dimensional space can, in theory, be used to sample the states
with a desired property X (such as Sean's "beneficial sequences"), even
if the number of such states is exponentially small compared to the
total state space size. Such random walks are at the heart of
Monte-Carlo methods, used to solve a wide variety of problems in
physics, statistics, computer science, etc. The the time requirements
for such a random walk would depend on the distribution of valid states
(i.e. "beneficial sequences") in the space, the transition probabilities
between each state, and, to a lesser extent, the starting point. Of
course, the size of each state (i.e. the dimension of the space) is also
a factor, but the key point (and what makes Monte-Carlo techniques so
useful) is that the relationship between time and state size need not be
exponential. Depending on the specific details described above, the
time requirement may be any function of state size - possibly even a
linear function! Again, I totally agree that a simple Monte-Carlo
process is a pitiful model for evolution - but Sean's statistics are way
off even in when applied to this model. His probabilities would only be
valid if evolution worked by repeatedly generating N-amino-acid
sequences de novo *every time*, with selection only keeping "beneficial"
sequences. That is, Sean's calculation reflects the probability of
finding a desired state with property X by blind, uniform random
sampling of N-dimensional space. The best thing that could be said
about such an attempt to model evolution is that it is laughable. But
it appears that Sean does not realize that he is making this mistake,
and keeps repeating his probabilities like a broken record.

[snip]

Sean Pitman

unread,
Dec 29, 2003, 7:13:57 PM12/29/03
to
howard hershey <hers...@indiana.edu> wrote in message news:<bscr3q$12r$1...@hood.uits.indiana.edu>...


> Sean, to make a long story short, the ratio of beneficial versus
> non-beneficial you give is utterly irrelevant to anything. The
> denominator of your ratio is always *total sequence space* based on
> taking 1/20 to the power of the total number of amino acids (or minimum
> number, which, in those cases you want to be unevolvable, is always the
> same as total number of amino acids).

Yes. I'm interested in the total number of amino acids required for a
particular type of function to be realized at its most minimum
beneficial level of function. This level is very different depending
on the type of function in question. Those types of new functions
that require more than a couple thousand fairly specified amino acids
working together at the same time simply do not evolve no matter what
you start with, functional or not.

> That denominator is utterly
> irrelevant *unless* your model is that every new functional protein
> arises by a random walk from a random protein sequence. The *real*
> model of evolution assumes a quite different mechanism, the modification
> of a pre-existing protein (or duplicate thereof) in a specific organism.

That's exactly the model that I'm talking about. Starting with
pre-exiting proteins having pre-existing individual and collective
beneficial functions, you will not see the evolution of any new type
of protein function that requires more than a couple thousand fairly
specified amino acids working together at the same time. You can use
duplication, point mutation, translocation, frame shifts, etc., and
they will all fail to get you a new type of function that goes very
far beyond the lowest levels of functional complexity toward any new
type of function. Simple up-regulation of what you already have will
only get you so far. Evolving new sequences with new functions just
doesn't happen beyond very low levels of functional complexity.

> The sequence space that is *relevant* to evolutionary mechanism is the
> sequence space encoded by the proteins in that organism.

Not so. The sequence space that is relevant to the evolution of a
particular gene pool is the sequence space that surrounds all possible
beneficial functions that could be used by that type of organism in
its current environment at various levels of functional complexity.
Remember, the sequence space changes exponentially depending upon the
level of complexity in question.

> That is, the
> only relevant question is if, in *this* non-random sequence space, there
> is a sequence x number of changes away from a selectable functionality
> required by an environmental change.

And the lower the level of functional complexity the more of such
functions there will be within a couple steps of what happens to
already be there in the gene pool. However, regardless of the
starting point, the average distance to those functions at higher and
higher levels of complexity increases exponentially. This means that
no matter what life form or gene pool you start with, it will be
limited in its evolutionary potential to only those functions that
require, at minimum, no more than a couple thousand fairly specified
amino acids working together at the same time. The reason for this is
that you just will never find an organism that just happens to be only
a few steps away from any beneficial function at such a level of
complexity even if you tried out zillions of organisms and gene pools
over zillions of years of time.

> The sequence space in any given
> organism is most certainly not *even* a random sample of total sequence
> space. Far from it. Almost every one of the proteins has some
> functional utility already. That alone makes your "ratio", based as it
> is on a denominator of *total sequence space*, GIGO.

Certainly the proteins that a creature starts with are all pretty much
beneficial - in good working order. I don't understand how you think
that I am arguing against this concept. I'm clearly not arguing
against this at all. In fact I use this concept as the basis for my
argument. You start with something that is fully functional. All the
proteins are working in a very beneficial way. Now, get something new
- a new type of function. That is the goal of evolution, new types of
beneficial functions. For awhile evolution does pretty good - at the
lowest levels of functional complexity. However, with each step up
the ladder of functional complexity (i.e., each additional fairly
specified minimum amino acid requirement), evolution does
exponentially worse and worse until it completely stalls out well
before the level of just a couple thousand fairly specified amino
acids are reached. New functions at such a level and beyond just do
not evolve in any creature no matter what their original starting
point was and no matter if trillions upon trillions of years are
provided.

Sean
www.naturalselection.0catch.com

howard hershey

unread,
Dec 30, 2003, 12:21:00 PM12/30/03
to

Sean Pitman wrote:
> howard hershey <hers...@indiana.edu> wrote in message news:<bscr3q$12r$1...@hood.uits.indiana.edu>...
>
>
>>Sean, to make a long story short, the ratio of beneficial versus
>>non-beneficial you give is utterly irrelevant to anything. The
>>denominator of your ratio is always *total sequence space* based on
>>taking 1/20 to the power of the total number of amino acids (or minimum
>>number, which, in those cases you want to be unevolvable, is always the
>>same as total number of amino acids).
>
>
> Yes.

"Yes." to what? To the fact that the 'minimum number of amino acids' is
*arbitrarily* determined to be the same as 'total number of amino acids'
(or insignificantly different from it) whenever you *arbitrarily* want
that protein system to be unevolvable? Or to the fact that you are
creating a denominator which pretends that evolution works by searching
through total sequence space?

> I'm interested in the total number of amino acids required for a
> particular type of function to be realized at its most minimum
> beneficial level of function. This level is very different depending
> on the type of function in question.

No. It is only a 'very different' level when you want a system to be
declared 'unevolvable'. In every system where you cannot dispute that a
new function evolved (by one or a few mutational steps), you arbitrarily
declare that, because this change did not require "thousands of amino
acids", it does not count as whatever you think evolution involves and
one does not make the same calculation with a denominator that uses the
total amino acid number.

> Those types of new functions
> that require more than a couple thousand fairly specified amino acids
> working together at the same time simply do not evolve no matter what
> you start with, functional or not.

How do you go about determining the difference between 'those types of
new functions that require more than a couple of thousand fairly
specified amino acids working together at the same time' and 'those
types of new functions' that don't? I cannot think of *any* system that
requires "a couple of thousand fairly specified amino acids working
together at the same time". Can you (and you actually need to *justify*
the claim)?

>>That denominator is utterly
>>irrelevant *unless* your model is that every new functional protein
>>arises by a random walk from a random protein sequence. The *real*
>>model of evolution assumes a quite different mechanism, the modification
>>of a pre-existing protein (or duplicate thereof) in a specific organism.
>
>
> That's exactly the model that I'm talking about.

I agree that the *first* model is the one you have been using.

> Starting with
> pre-exiting proteins having pre-existing individual and collective
> beneficial functions, you will not see the evolution of any new type
> of protein function that requires more than a couple thousand fairly
> specified amino acids working together at the same time.

Why would anyone expect to see this? I cannot think of any realistic
evolutionary model of any biological system that requires a couple of
thousand amino acid changes?

> You can use
> duplication, point mutation, translocation, frame shifts, etc., and
> they will all fail to get you a new type of function that goes very
> far beyond the lowest levels of functional complexity toward any new
> type of function.

All evolution is at the lowest levels of functional complexity. All
evolution involves duplication, point mutation, translocation, frame
shifts, etc.

> Simple up-regulation of what you already have will


> only get you so far. Evolving new sequences with new functions just
> doesn't happen beyond very low levels of functional complexity.

One simply does not evolve "new sequences" by starting with a protein
which is thousands of amino acids away from the end point before there
is any selectable function. Your hypothetical system simply does not
exist in nature.

>>The sequence space that is *relevant* to evolutionary mechanism is the
>>sequence space encoded by the proteins in that organism.
>
>
> Not so. The sequence space that is relevant to the evolution of a
> particular gene pool is the sequence space that surrounds all possible
> beneficial functions that could be used by that type of organism in
> its current environment at various levels of functional complexity.

Nope. The sequence space that is relevant to evolutionary mechanisms is
the sequence space immediately near (within a few mutational steps of)
the existing genome in that organism. Period. End of story.

> Remember, the sequence space changes exponentially depending upon the
> level of complexity in question.

Since you are unable to quantify "level of complexity", the point is moot.

>>That is, the
>>only relevant question is if, in *this* non-random sequence space, there
>>is a sequence x number of changes away from a selectable functionality
>>required by an environmental change.
>
>
> And the lower the level of functional complexity the more of such
> functions there will be within a couple steps of what happens to
> already be there in the gene pool. However, regardless of the
> starting point, the average distance to those functions at higher and
> higher levels of complexity increases exponentially.

The *only* way this would make sense is if your 'new' function were a
teleologically determined goal toward which all changes were made. That
is not how evolution works.

> This means that
> no matter what life form or gene pool you start with, it will be
> limited in its evolutionary potential to only those functions that
> require, at minimum, no more than a couple thousand fairly specified
> amino acids working together at the same time. The reason for this is
> that you just will never find an organism that just happens to be only
> a few steps away from any beneficial function at such a level of
> complexity even if you tried out zillions of organisms and gene pools
> over zillions of years of time.

You have not even convinced me that there *are* functions that require
"a couple of thousand fairly specified amino acids", all of which must
be 'just so' so that all of them work "together at the same time". You
have proposed that the bacterial flagella is such a system, but I have
yet to see how you calculated the 'minimum number of amino acids' needed
and what effect the existence of a TTSS as a precursor system would have
on that 'minimum number of amino acids', given that much of the proteins
perform the same secondary functions in both (e.g., bind to the same or
very similar other protein, export proteins, etc.).

>>The sequence space in any given
>>organism is most certainly not *even* a random sample of total sequence
>>space. Far from it. Almost every one of the proteins has some
>>functional utility already. That alone makes your "ratio", based as it
>>is on a denominator of *total sequence space*, GIGO.
>
>
> Certainly the proteins that a creature starts with are all pretty much
> beneficial - in good working order. I don't understand how you think
> that I am arguing against this concept.

I am pointing out that sequence space which is not a random sample of
total sequence space has no relationship to the denominator you use in
your calculations. That denominator is predicated upon the phoney,
irrelevant straw man idea that evolution proceeds by starting with a
sequence thousands of amino acids away from the particular teleologic
system you think must be the goal. And that getting to that teleologic
end point from the starting point of a random sequence involves a random
walk without any possibility of intermediate functionality. That is a
dishonest representation of evolution on several points: 1) That the
starting point is an effectively random sequence thousands of amino
acids away from the end point. 2) That there are no possible states of
intermediate utility between the random starting point and the end
point. 3) That the end point is a teleological end point.

> I'm clearly not arguing
> against this at all. In fact I use this concept as the basis for my
> argument. You start with something that is fully functional. All the
> proteins are working in a very beneficial way. Now, get something new
> - a new type of function. That is the goal of evolution, new types of
> beneficial functions. For awhile evolution does pretty good - at the
> lowest levels of functional complexity.

And it does so without needing to change 'thousands of amino acids'.
Part of the reason is because *no* single, *immediate* function of any
single protein's or protein complex's *active sites* involves thousands
of amino acids. Each active site on a protein or on protein complexes
involves only a few amino acids. Only a few amino acids on a protein
are involved in binding a substrate or another protein. Larger
'functions' involve a number of *independent* binding sites and active
sites, each of which can evolve *independently* of the other sites and
do so in a *stepwise* fashion. Because of the *independence* of these
sites, states of *intermediate* utility are possible. One can have an
intermediate state of functional utility that does not involve *all* the
proteins of the current flagella. And then one can change a different
protein that has an independent functionality so that it binds to the
complex of the proto-flagella that has a non-flagellar functional
utility by changing, not thousands, but only a few, amino acids.

> However, with each step up
> the ladder of functional complexity (i.e., each additional fairly
> specified minimum amino acid requirement), evolution does
> exponentially worse and worse until it completely stalls out well
> before the level of just a couple thousand fairly specified amino
> acids are reached. New functions at such a level and beyond just do
> not evolve in any creature no matter what their original starting
> point was and no matter if trillions upon trillions of years are
> provided.

We can both agree that they would not evolve *if* evolution worked the
way you claim it does. But that is a bogus straw man version of
evolution that has no relationship to the *real* model of evolution.
That is, your calculations (particularly the denominator) is an utterly
irrelevant GIGO nonsense.

Your straw man model of evolution is basically the molecular equivalent
of the idea of a lizard giving birth to a bird. That is, the idea that
a randomly chosen animal must be able to give birth to another
teleologically determined random functional animal without any
possibility of a functional intermediate. You can see a lizard giving
birth to a slightly modified lizard (say with extra claws). But you
just cannot see a lizard giving birth to a bird. And since we cannot
produce this in the lab in five years, that proves that evolution
doesn't work. GIGO and more GIGO.
>
> Sean
> www.naturalselection.0catch.com
>

Sean Pitman

unread,
Dec 30, 2003, 12:25:23 PM12/30/03
to
RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...


> It is even worse than that. Even random walks starting at random points
> in N-dimensional space can, in theory, be used to sample the states
> with a desired property X (such as Sean's "beneficial sequences"), even
> if the number of such states is exponentially small compared to the
> total state space size.

This depends upon just how exponentially small the number of
beneficial states is relative to the state space. It also depends
upon how fast this space is searched through. For example, if the
ratio of beneficial states to non-beneficial states is as high as say,
1 in a 1e12, and if 1e9 states are searched each second, how long with
it take, on average, to find a new beneficial state? It will take
just over 1,000 seconds - a bit less than 20 minutes on average. But,
what happens if at higher levels of functional complexity the density
of beneficial functions decreases exponentially with each step up the
ladder? The rate of search stays the same, but the junk sequences
increase exponentially and so the time required to find the rarer and
rarer beneficial states also increases exponentially.

> Such random walks are at the heart of
> Monte-Carlo methods, used to solve a wide variety of problems in

> physics, statistics, computer science, etc. The time requirements

> for such a random walk would depend on the distribution of valid states
> (i.e. "beneficial sequences") in the space, the transition probabilities
> between each state, and, to a lesser extent, the starting point.

Exactly. And the "beneficial sequences" (i.e., the density of
beneficial sequences) is inversely related, in an exponential manner,
to the level of minimum informational complexity required for these
functions to work at a minimum level of beneficial function.

> Of
> course, the size of each state (i.e. the dimension of the space) is also
> a factor, but the key point (and what makes Monte-Carlo techniques so
> useful) is that the relationship between time and state size need not be
> exponential. Depending on the specific details described above, the
> time requirement may be any function of state size - possibly even a
> linear function!

No, go and check these formulas again and then show me how they are
"linear" with increasing minimum state sizes. They are not linear,
but exponential relationships. However, even if they actually were
linear as you suggest, this would still pose a significant problem to
evolution beyond a certain point of informational complexity. Even a
linear decrease in density with increasing minimum space size would
result in a linear increase in required time to find new functions at
that level of complexity.

> Again, I totally agree that a simple Monte-Carlo
> process is a pitiful model for evolution - but Sean's statistics are way
> off even in when applied to this model.

I fail to see how you have supported this statement of yours. My
statistics do seem to match not only the exponentially increasing
ratios found in language systems like English and information systems
like functional proteins and genes, but they also match statistical
programs used in computer software development and the like. This is
why computers cannot evolve their own software programs beyond the
lowest levels of functional complexity. To go very far beyond the
informational complexity that they already have they require the
intelligence and creativity of human programmers to get across these
vast neutral gaps that simply cannot be searched out in any sort of
reasonable amount of time by mindless processes. So, please do show me
how your Monte-Carlo technique can search an increasing state space
and find beneficial states in a linear fashion with each increase in
the minimum informational complexity requirement.

> His probabilities would only be
> valid if evolution worked by repeatedly generating N-amino-acid
> sequences de novo *every time*, with selection only keeping "beneficial"
> sequences. That is, Sean's calculation reflects the probability of
> finding a desired state with property X by blind, uniform random
> sampling of N-dimensional space. The best thing that could be said
> about such an attempt to model evolution is that it is laughable.

How is this laughable when you evolutionists can't seem to come up
with any other way to explain now new types of functions that require
at least a couple thousand fairly specified amino acids working
together at the same time can evolve? What method do you propose to
explain such levels of functional diversity within living things? How
do you get from one type of function at such a level to another type
of function within this same level of specified complexity?

> But
> it appears that Sean does not realize that he is making this mistake,
> and keeps repeating his probabilities like a broken record.

And you guys keep repeating your non-supported assertions like a
mantra. You keep saying I'm crazy and that my ideas are laughable,
but you have presented nothing to significantly counter my position.
My hypothesis remains untouched and my predictions still hold. What
have you presented besides a bunch of non-supported "just-so" and
"trust me" statements? Where is your falsifiable evidence? The best
that I can see is that you guys keep falling back on the philosophical
position that given enough time anything is possible via the
extraordinary creativity of The Mindless - even beyond the most
miraculous creations of mankind.

Basically evolution explains everything - even without demonstration -
and therefore nothing. It is a weak historical hypothesis at best.
It is not falsifiable by any sort of real time genetic experiment -
such as a Pasteur-like experiment. Every time a prediction fails, you
evolutionists just fall back on your philosophy and say, "Oh well, I
guess that particular level of evolution just requires millions of
years - but it certainly happened within 4 billion years that's for
sure!" Really, there is no way to falsify such a philosophical
position. Statistically you have nothing. Statistically it is very
clear that evolution, as an explanation for the variety and levels of
functional complexity that we find in all living things, is simply
untenable. Of course, you are free to hold whatever philosophical
position that you want, but if you hope to convince those who actually
wish to consider the statistical problems involved, you will have to
do much better than you have done so far to hold onto your illusions
of "scientific superiority".

Sean
www.naturalselection.0catch.com

Howard Hershey

unread,
Dec 31, 2003, 3:29:00 PM12/31/03
to

Sean Pitman wrote:
>
> RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...

Talking with Sean *really* is like an absurdist play written by Genet.
Let's call it "Evidence for Godot."

Waiting bum #1: See that cow at the top of the hill. God lifted him up there.

Waitng bum #2: Huh? What makes you think that there is no natural
mechanism that would allow a cow to reach the top of the hill?

Bum #1: That hill must be a 'thousand' feet high. There is no way that
a cow can jump a thousand feet from the valley to the top of the
mountain. Cows simply do not have the muscle power to do that. I can
prove it. Therefore it is impossible for a cow to reach the top by
randomly jumping up with no possible places of intermediate resting. A
cow simply cannot jump up a thousand times and rest in thin air to make
the next jump. Thus, this proves that it is impossible for a cow to
reach the top of the mountain.

Bum #2: What makes you think that the mechanism involved jumping
directly from the valley to the top of the mountain with no possible
intermediate stopping points?

Bum #1: The cow would have to leap a 'thousand' feet high to reach the
top of that mountain. The taller the mountain, the higher the cow has
to jump. I can maybe see a cow jumping 480 feet, but a thousand feet is
exponentially more difficult. And 2,000 feet is impossible. Nope.
There is no way that a cow can jump even a thousand feet in a single
bound with no intermediate resting places. And the fact that you cannot
produce any evidence of a cow jumping a thousand feet in the air in 30
seconds is evidence that it is impossible for a cow to reach the top of
the mountain unless God gives it a lift.

Bum #2: Well, I agree. The *mechanism* you propose *would* be
impossible. But what is to prevent a cow reaching the top of the
mountain from a place one (or even two) foot below the top? What is to
prevent the cow from walking (even randomly) up the hill in a stepwise
fashion? There seem to be a number of resting places on the side of the hill.

Bum#1: Well, I don't see any cow one foot below the top, do you?
Therefore it is impossible for a cow to ever have been one foot below
the top. And the cow would have had to jump from the valley to that
spot just below the top in any case. What evidence do you have that
there are intermediate places where cows have been?

Bum #2: There is a potential pathway up the hill that I can see. And
there are cattle footprints at some of the intermediate sites that look
just like the footprints of the cow at the top.

Bum #1: You keep waving these hypothetical pathways and hoofprint
similarities as if they were actual evidence that the cow at the top
passed that way. What you need to do to show that a cow can reach the
top of the mountain naturally is to have that cow down there jump up to
the top in 30 seconds with no intermediate resting places. Now that
*would* be proof that a natural mechanism is possible.

Bum #2: That's silly. I am explicitly rejecting your mechanism in
favor of a quite different, but still natural, one.

Bum #1: But you still haven't demonstrated that a cow can jump from the
bottom of the hill to the top in a single bound. Therefore God must
have lifted the cow from the valley to the hilltop.

Bum #2: Well, if my idea is true, I would expect to see some evidence
that a cow has passed this way, some evidence that a cow was at
intermediate steps along the pathway that I can see. Perhaps cow
patties. Even a pile of shit would be more evidence than you are
presenting for your Goddidit alternative. And I certainly agree that
your 'natural' alternative mechanism of a single leap is a pile of shit
and unlikely.

Bum#1: I am quite satisfied that at heights of 480 feet, a cow can jump
to the top of the mountain, but that the process gets much more
difficult when the height is 1,000 feet and completely impossible at
2,000. The difficulty of the jump increases exponentially with height.

Bum #2: I don't even think 480 feet is doable by the 'natural'
mechanism you propose, which is merely God lifting the animal from
valley to mountaintop, but without the God. I think an entirely
different mechanism (stepwise walking upslope) was involved. Moreover,
I bet there is some evidence (e.g., even a simple pile of shit would be
more than you present) that the stepwise mechanism is the correct one.
I think I will walk up the mountain myself and see if there is such evidence.

Bum #1: You won't find any because the cow had to get up the mountain
in one giant leap for cowkind. I will wait here for Godot to lift a cow
up again.

Exit Bum #2. AFAK, Bum #1 is still waiting for Godot to show him a miracle.


>
> > It is even worse than that. Even random walks starting at random points
> > in N-dimensional space can, in theory, be used to sample the states
> > with a desired property X (such as Sean's "beneficial sequences"), even
> > if the number of such states is exponentially small compared to the
> > total state space size.
>
> This depends upon just how exponentially small the number of
> beneficial states is relative to the state space.

Your argument is also completely irrelevant. All calculations based on
such a denominator are GIGO and irrelevant to any real evolutionary
mechanisms. The only time such a calculation would have any value at all
would be wrt the intitial steps in abiogenesis. At that point, the
frequency at which specific relevant 'functions' like polynucleotide
kinase activity or RNA ligase activity occur in randomly generated RNAs
of, say, 50 nt lengths, *would* be relevant. The *experimental*
evidence indicates that such activities (not optimal, but in the kingdom
of the blind...) occur with a frequency of 1/10^16 or 1/10^17 molecules.
That means that *all* of these functions/activities would be present in
less than a micromole (a mole is Avagadro's number or about 10^23
molecules) of such randomly generated RNA molecules.

> It also depends
> upon how fast this space is searched through. For example, if the
> ratio of beneficial states to non-beneficial states is as high as say,
> 1 in a 1e12, and if 1e9 states are searched each second, how long with
> it take, on average, to find a new beneficial state?

Utterly irrelevant unless one assumes that evolution works by starting
with a random sequence or random unrelated protein and engages in a
completely random walk to the one function. No intelligent person
thinks that is how evolution works.

[snip more stuff that is utterly devoid of any possible relevance to evolution.]



> And you guys keep repeating your non-supported assertions like a
> mantra. You keep saying I'm crazy and that my ideas are laughable,
> but you have presented nothing to significantly counter my position.
> My hypothesis remains untouched and my predictions still hold.

Since your hypothesis is "*IF* evolution works by starting with a random
sequence or a random unrelated protein and proceeds by a long random
walk with no intermediates state of utility, evolution will not happen."
is indeed untouched and predictions based on those assumptions will
hold. The problem is that that hypothesis is utterly irrelevant to the
way evolution *does* work and is belied by the very structure of genomes
and the relationships between protein sequences which indicates a
different, stepwise mechanism with intermediate stages of functional utility.

> What
> have you presented besides a bunch of non-supported "just-so" and
> "trust me" statements? Where is your falsifiable evidence?

Where is yours? You need to demonstrate that some system *must* evolve
by the mechanism you propose. That means you need to find a system
where no possible intermediate of utility can exist, find a system which
you know, for a fact, *must have* started with a sequence completely
unrelated to the current sequence and in which only the end result has
selectable activity of *any* sort, and you need to find, demonstrate,
and quantitate how you can determine a 'minimum amino acid number'. We
keep pointing out that the problem with your numbers is that it is
predicated on a false straw man version of evolution. And we keep
pointing out that one does not need to *change* thousands of amino acids
to convert, say, a TTSS to add a crude motility function. Just as one
does not need to *change* thousands of (nor even 480) amino acids to
convert ebg into a protein that has the lactase activity it did not
previously have. Nor does one need to *change* thousands of amino acids
to generate a 'new' motility function in the swarming bacteria. All you
have presented is vague ideas about 'thousands of fairly specified amino
acids' and random walks from random sequences that are utterly
irrelevant.

> The best
> that I can see is that you guys keep falling back on the philosophical
> position that given enough time anything is possible via the
> extraordinary creativity of The Mindless - even beyond the most
> miraculous creations of mankind.

No. We fall back on the real mechanisms of evolution, which bear no
relationship to the ideas included in your calculations. Ideas about
modification of existing proteins of relevance, like modification of
substrate binding in ebg without modification of the active site. Like
duplication and divergence that produced both the specialized (and
without lactase function) ebg and the specialized (and without ebg
function lacYZA operon). Like internal duplication (as in the human
lactase). Like multiple functionality (like beta-galactosidase). Like
chimera formation and independent utility of different motifs in a
protein (as in the human lactase).


>
> Basically evolution explains everything - even without demonstration -
> and therefore nothing. It is a weak historical hypothesis at best.

Historical hypotheses falsifiably *predicts* what is observed (common
descent in sequences, evolution in families, etc.). Intelligent design
predicts whatever one wants it to predict and thus predicts nothing.

> It is not falsifiable by any sort of real time genetic experiment -
> such as a Pasteur-like experiment.

Events that cannot be demonstrated by real time experiments (which
include planet formation and stellar formation and geological layer
formation) are, nontheless, falsifiable. They are falsifiable by virtue
of making specific predictions which can be tested against observation.

> Every time a prediction fails, you
> evolutionists just fall back on your philosophy and say, "Oh well, I
> guess that particular level of evolution just requires millions of
> years - but it certainly happened within 4 billion years that's for
> sure!"

Rates are important. So are observations. The evidence indicates that
ebg and lac operons (and other members of the family 2 glycoside
hydrolases) are not independent events with each happening by random
chance starting from a random sequence. The evidence indicates that
many of the proteins in bacterial flagella are not independent of
related proteins with independent functionality; that is, they did not
arise from a random sequence. All this evidence specifically tells us
that evolution does NOT involve starting with a random sequence and does
NOT proceed by a random walk though useless sequence space. That is, it
tells us that your argument is against a straw man.

> Really, there is no way to falsify such a philosophical
> position. Statistically you have nothing. Statistically it is very
> clear that evolution, as an explanation for the variety and levels of
> functional complexity that we find in all living things, is simply
> untenable.

No. It tells us that a straw man evolution that works by starting each
protein as a random sequence and having it go to a teleologic end point
via a random walk is untenable. Since that idea is unrelated to any
real evolutionary mechanism, it is irrelevant and the calculations based
upon these assumptions are mere GIGO.

> Of course, you are free to hold whatever philosophical
> position that you want, but if you hope to convince those who actually
> wish to consider the statistical problems involved,

But we are considering the statistical problems you pose and continually
point out that they are irrelevant GIGO based on a straw man idea of
evolution. They don't *even* have relevance to abiogenesis (which did
not involve proteins encoded in DNA). They are utterly without any
redeeming value when applied to recently evolved systems like 'lactase'
or 'motility' or 'heme-binding electron transfer'.

> you will have to
> do much better than you have done so far to hold onto your illusions
> of "scientific superiority".

You have to do much better to present any sort of scientific idea at all.
>
> Sean
> www.naturalselection.0catch.com

Sean Pitman

unread,
Dec 31, 2003, 7:18:49 PM12/31/03
to
howard hershey <hers...@indiana.edu> wrote in message news:<bssc8f$5sb$1...@hood.uits.indiana.edu>...

> Sean Pitman wrote:
> > howard hershey <hers...@indiana.edu> wrote in message news:<bscr3q$12r$1...@hood.uits.indiana.edu>...
> >
> >>Sean, to make a long story short, the ratio of beneficial versus
> >>non-beneficial you give is utterly irrelevant to anything. The
> >>denominator of your ratio is always *total sequence space* based on
> >>taking 1/20 to the power of the total number of amino acids (or minimum
> >>number, which, in those cases you want to be unevolvable, is always the
> >>same as total number of amino acids).
> >
> >
> > Yes.
>
> "Yes." to what?

Perhaps if you read the very next sentence before writing a whole
non-pertinent paragraph you would have your answer . . .

<snip>

> > I'm interested in the total number of amino acids required for a
> > particular type of function to be realized at its most minimum
> > beneficial level of function. This level is very different depending
> > on the type of function in question.
>
> No. It is only a 'very different' level when you want a system to be
> declared 'unevolvable'.

Not at all. Take for example the cytochrome c function. At minimum
this type of function appears to require at least 80 or so amino acids
in a fairly specified order. This is not so for the lactase function.
Can functional lactase enzyme to work in a beneficial manner in any
life form with only 80 amino acids? I don't think so. It seem like
the lactase function requires more than 400 fairly specified amino
acids (though less specified than in the cytochrome c function) before
this type of function can be realized. Now, can you get the flagellar
motility type of function with only 400-coded amino acid positions
working together at the same time? I don't think so. This type of
function seems to require at least 4,000 to 6,000 fairly specified
amino acids working together at the same time in order for the bare
minimum level of beneficial function of this type to be realized.

So you see, different types of cellular functions do indeed required
different numbers of amino acids at minimum as well as different
degrees of minimum amino acid specificity. You simply cannot deny
this obvious fact. You can try to cover it up with a lot of hand
waving and smoke blowing, as you have tried so valiantly to do, but I
don't think you have a very easy job since this idea is so obviously
true. It is difficult to cover up and dismiss something so clear and
obvious as this. Your efforts to do so only make it clearer.

> In every system where you cannot dispute that a
> new function evolved (by one or a few mutational steps), you arbitrarily
> declare that, because this change did not require "thousands of amino
> acids", it does not count as whatever you think evolution involves and
> one does not make the same calculation with a denominator that uses the
> total amino acid number.

I haven't made any sort of arbitrary declaration. As it turns out,
the only examples that you evolutionists have come up with as
"real-time" examples of evolution in action have not required anything
more than a few hundred loosely specified amino acids working together
at the same time. None of your examples of functions using thousands
of amino acids at the same time actually require that all of these
amino acids be there for minimum beneficial function of that type to
be realized. It is my hypothesized position that the reason why your
examples were only one or two mutational steps away from success is
because the relative density of beneficial sequences at such low
levels of functional complexity are rather dense. However, when you
start talking about systems of function that require, at minimum, a
few thousand fairly specified amino acids working together at the same
time, you simply run out of examples because there just aren't any. I
know this must be frustrating for you, but that is the cold hard fact
of the matter. You evolutionists just don't have anything that goes
very far beyond the lowest levels of functional complexity.

> > Those types of new functions
> > that require more than a couple thousand fairly specified amino acids
> > working together at the same time simply do not evolve no matter what
> > you start with, functional or not.
>
> How do you go about determining the difference between 'those types of
> new functions that require more than a couple of thousand fairly
> specified amino acids working together at the same time' and 'those
> types of new functions' that don't? I cannot think of *any* system that
> requires "a couple of thousand fairly specified amino acids working
> together at the same time". Can you (and you actually need to *justify*
> the claim)?

Yes I can and I have presented such examples over and over again -
such as the flagellar system of bacterial motility. This type of
function requires, at minimum, at least 20 different kinds of proteins
working together at the same time. Each of these proteins is composed
of around 300 fairly specified amino acids on average. This works out
to around 6,000aa collective, novel, fairly specified amino acid
positions working together at the same time for this time of function
to be realized at a minimum level of beneficial selectability.

> >>That denominator is utterly
> >>irrelevant *unless* your model is that every new functional protein
> >>arises by a random walk from a random protein sequence. The *real*
> >>model of evolution assumes a quite different mechanism, the modification
> >>of a pre-existing protein (or duplicate thereof) in a specific organism.
> >
> >
> > That's exactly the model that I'm talking about.
>
> I agree that the *first* model is the one you have been using.

No - I'm taking your *second* "true model" of evolution as "true". I
agree that the real model of evolution assumes the modification of a


pre-existing protein (or duplicate thereof) in a specific organism.

That is the model that I'm talking about. That is the model that
cannot evolve very far beyond the lowest levels of complexity from
what it started with. I'm using your model Howard - your "real" model
of evolution. I haven't made up a new model at all. What I am saying
is that your "real" model doesn't work like you think it does.

> > Starting with
> > pre-exiting proteins having pre-existing individual and collective
> > beneficial functions, you will not see the evolution of any new type
> > of protein function that requires more than a couple thousand fairly
> > specified amino acids working together at the same time.
>
> Why would anyone expect to see this? I cannot think of any realistic
> evolutionary model of any biological system that requires a couple of
> thousand amino acid changes?

I'm hitting my head against a brick wall here! Come on man! Try and
understand what I'm saying. The system that requires a couple
thousand amino acids at minimum most likely does not require a couple
thousand amino acid changes, starting with something that is already
there, to be realized. However, on average, such a system would
probably require several hundred fairly specified amino acids position
changes to what is already there. Then, a system that requires 10,000
fairly specified amino acids at minimum would probably require, on
average, perhaps as many as 1,000 fairly specified amino acid position
changes. If just 10% of these were neutral changes, on average, a
large colony numbering in the trillions would still take trillions
upon trillions upon trillions of years to evolve even one new type of
function at such a level of minimum informational complexity
(complexity = minimum sequence size plus minimum sequence
specificity).

> > You can use
> > duplication, point mutation, translocation, frame shifts, etc., and
> > they will all fail to get you a new type of function that goes very
> > far beyond the lowest levels of functional complexity toward any new
> > type of function.
>
> All evolution is at the lowest levels of functional complexity.

I couldn't have said it better myself . . .

> All
> evolution involves duplication, point mutation, translocation, frame
> shifts, etc.

Exactly . . .

> > Simple up-regulation of what you already have will
> > only get you so far. Evolving new sequences with new functions just
> > doesn't happen beyond very low levels of functional complexity.
>
> One simply does not evolve "new sequences" by starting with a protein
> which is thousands of amino acids away from the end point before there
> is any selectable function. Your hypothetical system simply does not
> exist in nature.

Your wrong. Such systems do exist in nature and in every living
thing. The average distance to simple functions requiring just 100 or
so loosely specified amino acids sequences may only be 3 or 4 neutral
amino acid changes wide. However, those types of functions that
require a minimum sequence of 1,000aa are separated by much wider
neutral gaps from everything that a given cell has by an average of
say, 30 or 40 neutral positional changes (i.e., representing an
average neutral gap of sequence space of over 1e50 sequences). Then,
when you get up to those functions that require several thousand
fairly specified amino acids at minimum, the average gap may grow to
500 or so neutral changes on average (sequence space of 1e650). Are
you starting to see the problem here?

> >>The sequence space that is *relevant* to evolutionary mechanism is the
> >>sequence space encoded by the proteins in that organism.
> >
> >
> > Not so. The sequence space that is relevant to the evolution of a
> > particular gene pool is the sequence space that surrounds all possible
> > beneficial functions that could be used by that type of organism in
> > its current environment at various levels of functional complexity.
>
> Nope. The sequence space that is relevant to evolutionary mechanisms is
> the sequence space immediately near (within a few mutational steps of)
> the existing genome in that organism. Period. End of story.

That would be great if it were true. The fact is that on average, as
you move up the ladder of functional complexity, there are
exponentially fewer and fewer starting points that are anywhere near
any other type of beneficial functional sequence at that level of
functional complexity. There just aren't any sequences within the
gene pool that are only one or two steps away from a new type of
function at such levels of complexity. That is why you don't have any
examples of real time evolution at such levels of complexity. It just
doesn't happen. Period. End of Story.

> > Remember, the sequence space changes exponentially depending upon the
> > level of complexity in question.
>
> Since you are unable to quantify "level of complexity", the point is moot.
>
> >>That is, the
> >>only relevant question is if, in *this* non-random sequence space, there
> >>is a sequence x number of changes away from a selectable functionality
> >>required by an environmental change.
> >
> >
> > And the lower the level of functional complexity the more of such
> > functions there will be within a couple steps of what happens to
> > already be there in the gene pool. However, regardless of the
> > starting point, the average distance to those functions at higher and
> > higher levels of complexity increases exponentially.
>
> The *only* way this would make sense is if your 'new' function were a
> teleologically determined goal toward which all changes were made. That
> is not how evolution works.

Not at all. I'm not talking about any one particular type of
function, but about all types of functions within a given level of
complexity. No new type of function within that level of higher
complexity (requiring a few thousand fairly specified amino acids at
minimum), will be able to evolve given what a genome has to start
with. This is because, on average, what a given genome has to start
with will be hundreds and even thousands of neutral fairly specified
amino acid positional changes away from all other types of functions
within that level of complexity.

Anyway, this is all I have time for today . . .

Sean
www.naturalselection.0catch.com

Howard Hershey

unread,
Dec 31, 2003, 8:22:16 PM12/31/03
to

Sean Pitman wrote:
>
> howard hershey <hers...@indiana.edu> wrote in message news:<bssc8f$5sb$1...@hood.uits.indiana.edu>...
> > Sean Pitman wrote:
> > > howard hershey <hers...@indiana.edu> wrote in message news:<bscr3q$12r$1...@hood.uits.indiana.edu>...
> > >
> > >>Sean, to make a long story short, the ratio of beneficial versus
> > >>non-beneficial you give is utterly irrelevant to anything. The
> > >>denominator of your ratio is always *total sequence space* based on
> > >>taking 1/20 to the power of the total number of amino acids (or minimum
> > >>number, which, in those cases you want to be unevolvable, is always the
> > >>same as total number of amino acids).
> > >
> > >
> > > Yes.
> >
> > "Yes." to what?
>
> Perhaps if you read the very next sentence before writing a whole
> non-pertinent paragraph you would have your answer . . .
>
> <snip>
> > > I'm interested in the total number of amino acids required for a
> > > particular type of function to be realized at its most minimum
> > > beneficial level of function. This level is very different depending
> > > on the type of function in question.
> >
> > No. It is only a 'very different' level when you want a system to be
> > declared 'unevolvable'.
>
> Not at all. Take for example the cytochrome c function. At minimum
> this type of function appears to require at least 80 or so amino acids
> in a fairly specified order.

I have asked this repeatedly, so now I will shout. HOW THE BLOODY HELL
DO YOU DEFINE WHAT IS OR IS NOT 'FAIRLY SPECIFIED ORDER' AND WHAT VALUE
DO YOU GIVE IT?

> This is not so for the lactase function.
> Can functional lactase enzyme to work in a beneficial manner in any
> life form with only 80 amino acids?

You keep confounding total number of amino acids with number of amino
acids needed to perform a function.

> I don't think so. It seem like
> the lactase function requires more than 400 fairly specified amino
> acids (though less specified than in the cytochrome c function) before
> this type of function can be realized.

So, shouting again, HOW THE BLOODY HELL DID YOU CALCULATE A NUMBER OF
400 AMINO ACIDS? I STRONGLY SUSPECT THAT NUMBER IS NOTHING BUT A WAG
(WILD-ASSED GUESS).

> Now, can you get the flagellar
> motility type of function with only 400-coded amino acid positions
> working together at the same time? I don't think so. This type of
> function seems to require at least 4,000 to 6,000 fairly specified
> amino acids working together at the same time in order for the bare
> minimum level of beneficial function of this type to be realized.

AND AGAIN, HOW THE BLOODY HELL DID YOU CALCULATE THAT NUMBER? I
STRONGLY SUSPECT THAT NUMBER IS NOTHING BUT A WAG. Convince me otherwise.


>
> So you see, different types of cellular functions do indeed required
> different numbers of amino acids at minimum as well as different
> degrees of minimum amino acid specificity.

How the bloody hell can I tell. You simply toss these numbers out
without giving any justification for using them. You don't even tell me
how to determine the number of 'minimal amino acids' needed for
function. You don't even tell me what you mean by function. Is the
part of the human lactase that goes through the membrane performing a
different 'function' than the part that is actively cleaving the
substrate? Is the part that *binds* the substrate performing a
different function than the part that is actively acting as the
nucleophil? If I can change one without affecting the other, is that a
change in function?

> You simply cannot deny
> this obvious fact. You can try to cover it up with a lot of hand
> waving and smoke blowing, as you have tried so valiantly to do, but I
> don't think you have a very easy job since this idea is so obviously
> true. It is difficult to cover up and dismiss something so clear and
> obvious as this. Your efforts to do so only make it clearer.

You, dear boy, are doing the hand-waving here. You repeatedly toss out
numbers and words without making any attempt to operationally define
them or justify the resulting numbers you get.


>
> > In every system where you cannot dispute that a
> > new function evolved (by one or a few mutational steps), you arbitrarily
> > declare that, because this change did not require "thousands of amino
> > acids", it does not count as whatever you think evolution involves and
> > one does not make the same calculation with a denominator that uses the
> > total amino acid number.
>
> I haven't made any sort of arbitrary declaration. As it turns out,
> the only examples that you evolutionists have come up with as
> "real-time" examples of evolution in action have not required anything
> more than a few hundred loosely specified amino acids working together
> at the same time.

No. All examples of evolution in action involve the modification of
pre-existing systems. That is what descent with (and by) modification
means. It doesn't matter how many amino acids the pre-existing system has.

> None of your examples of functions using thousands
> of amino acids at the same time actually require that all of these
> amino acids be there for minimum beneficial function of that type to
> be realized.

Well, it is damn hard to find the systems you want amidst all the other
systems that evolve just fine. But systems that are thousands of amino
acids long (that is the only operational definition I see you using) can
be modified by a few mutational changes just as easily as systems less
than 100 amino acids long.

> It is my hypothesized position that the reason why your
> examples were only one or two mutational steps away from success is
> because the relative density of beneficial sequences at such low
> levels of functional complexity are rather dense.

No. It is because our systems did not evolve via the mechanism you
claim must exist -- starting with a random sequence and proceding via a
random walk. They started with an 'ancestral' sequence which was then
'modified' by a few simple mutations. The number of amino acids in the
'ancestral' or 'final' systems is completely irrelevant to that
mechanism. The reason the relative density of beneficial sequences is
dense is because of the non-random nature of the starting point. Period.

> However, when you
> start talking about systems of function that require, at minimum, a
> few thousand fairly specified amino acids working together at the same
> time, you simply run out of examples because there just aren't any.

There are damn few such systems at all. *Even* if you count every
single amino acid in the proteins as being 'fairly specified' (whatever
that means).

> I
> know this must be frustrating for you, but that is the cold hard fact
> of the matter. You evolutionists just don't have anything that goes
> very far beyond the lowest levels of functional complexity.

I have no idea how you *quantify* the level of functional complexity,
given that most of the active sites act independently of each other.


>
> > > Those types of new functions
> > > that require more than a couple thousand fairly specified amino acids
> > > working together at the same time simply do not evolve no matter what
> > > you start with, functional or not.
> >
> > How do you go about determining the difference between 'those types of
> > new functions that require more than a couple of thousand fairly
> > specified amino acids working together at the same time' and 'those
> > types of new functions' that don't? I cannot think of *any* system that
> > requires "a couple of thousand fairly specified amino acids working
> > together at the same time". Can you (and you actually need to *justify*
> > the claim)?
>
> Yes I can and I have presented such examples over and over again -

And over and over again you have failed to do anything beyond
hand-waving arguments tossing out total amino acid numbers as
justification for this. How *do* you determine what counts as a 'fairly
specified amino acid'? How *do* you determine that the amino acids that
bind one protein to another are 'working together at the same time'
rather than just independently doing their thing? By nothing but empty
verbiage and bogus hand-waving numbers.

> such as the flagellar system of bacterial motility. This type of
> function requires, at minimum, at least 20 different kinds of proteins
> working together at the same time. Each of these proteins is composed
> of around 300 fairly specified amino acids on average. This works out
> to around 6,000aa collective, novel, fairly specified amino acid
> positions working together at the same time for this time of function
> to be realized at a minimum level of beneficial selectability.
>
> > >>That denominator is utterly
> > >>irrelevant *unless* your model is that every new functional protein
> > >>arises by a random walk from a random protein sequence. The *real*
> > >>model of evolution assumes a quite different mechanism, the modification
> > >>of a pre-existing protein (or duplicate thereof) in a specific organism.
> > >
> > >
> > > That's exactly the model that I'm talking about.
> >
> > I agree that the *first* model is the one you have been using.
>
> No - I'm taking your *second* "true model" of evolution as "true".

So you say. But that is at variance with reality.

> I
> agree that the real model of evolution assumes the modification of a
> pre-existing protein (or duplicate thereof) in a specific organism.
> That is the model that I'm talking about. That is the model that
> cannot evolve very far beyond the lowest levels of complexity from
> what it started with. I'm using your model Howard - your "real" model
> of evolution. I haven't made up a new model at all. What I am saying
> is that your "real" model doesn't work like you think it does.

The above 'words' are at total variance with the truth. The model you
are proposing *is* the first one, the one that involves a random walk
from a random sequence. That is the truth of what your math is saying.
It is your words that are not telling the truth. I don't know whether
that is from your ignorance wrt what your math is saying or your
ignorance wrt to what your words are saying. But there is a discrepancy.

> > > Starting with
> > > pre-exiting proteins having pre-existing individual and collective
> > > beneficial functions, you will not see the evolution of any new type
> > > of protein function that requires more than a couple thousand fairly
> > > specified amino acids working together at the same time.
> >
> > Why would anyone expect to see this? I cannot think of any realistic
> > evolutionary model of any biological system that requires a couple of
> > thousand amino acid changes?
>
> I'm hitting my head against a brick wall here! Come on man! Try and
> understand what I'm saying. The system that requires a couple
> thousand amino acids at minimum most likely does not require a couple
> thousand amino acid changes, starting with something that is already
> there, to be realized. However, on average, such a system would
> probably require several hundred fairly specified amino acids position
> changes to what is already there.

Why do you think that. That is utterly stupid.

> Then, a system that requires 10,000
> fairly specified amino acids at minimum

And where the bloody hell is there a system that *requires* 10,000
bloody 'fairly specified' (whatever that means) amino acids? I think
all such systems are purely imaginary.

> would probably require, on
> average, perhaps as many as 1,000 fairly specified amino acid position
> changes. If just 10% of these were neutral changes, on average, a
> large colony numbering in the trillions would still take trillions
> upon trillions upon trillions of years to evolve even one new type of
> function at such a level of minimum informational complexity
> (complexity = minimum sequence size plus minimum sequence
> specificity).

The above is a statement that you are starting from a completely random
sequence at least 1000 amino acid *changes* away from the end point.
That is equivalent to a claim that the world was destroyed by aliens
last week and we are all now mental images in their computers.
Completely divorced from reality.

So where are these magical systems? And don't just *say* the bacterial
flagella and hand-wave meaningless numbers based on your WAGs wrt the
number of proteins and total number of amino acids. Convince me that
any possible pathway to the current flagella requires *at least one*
proposed step in the evolution of the bacterial flagella that *requires*
starting with a protein that is more than 1000 *required* amino acids*
away from the current state. Hell, more than 100 amino acids away.


>
> > >>The sequence space that is *relevant* to evolutionary mechanism is the
> > >>sequence space encoded by the proteins in that organism.
> > >
> > >
> > > Not so. The sequence space that is relevant to the evolution of a
> > > particular gene pool is the sequence space that surrounds all possible
> > > beneficial functions that could be used by that type of organism in
> > > its current environment at various levels of functional complexity.
> >
> > Nope. The sequence space that is relevant to evolutionary mechanisms is
> > the sequence space immediately near (within a few mutational steps of)
> > the existing genome in that organism. Period. End of story.
>
> That would be great if it were true. The fact is that on average, as
> you move up the ladder of functional complexity, there are
> exponentially fewer and fewer starting points that are anywhere near
> any other type of beneficial functional sequence at that level of
> functional complexity.

These are meaningless terms. How do you quantitate "level of functional
complexity"? All I see is that it is purely the total number of amino
acids, or, if you are feeling generous, about 50% of that value. No
analysis or anything. Sheer handwaving numerology.

> There just aren't any sequences within the
> gene pool that are only one or two steps away from a new type of
> function at such levels of complexity. That is why you don't have any
> examples of real time evolution at such levels of complexity. It just
> doesn't happen. Period. End of Story.

How did you determine this? By your GIGO numbers based on the
assumption that evolution works by starting with some random
(nonfunctional?) sequence thousands of nucleotides away from the end point?

> > > Remember, the sequence space changes exponentially depending upon the
> > > level of complexity in question.
> >
> > Since you are unable to quantify "level of complexity", the point is moot.

What, no answer to this?


> >
> > >>That is, the
> > >>only relevant question is if, in *this* non-random sequence space, there
> > >>is a sequence x number of changes away from a selectable functionality
> > >>required by an environmental change.
> > >
> > >
> > > And the lower the level of functional complexity the more of such
> > > functions there will be within a couple steps of what happens to
> > > already be there in the gene pool. However, regardless of the
> > > starting point, the average distance to those functions at higher and
> > > higher levels of complexity increases exponentially.
> >
> > The *only* way this would make sense is if your 'new' function were a
> > teleologically determined goal toward which all changes were made. That
> > is not how evolution works.
>
> Not at all. I'm not talking about any one particular type of
> function, but about all types of functions within a given level of
> complexity.

And what the hell is that supposed to mean?

> No new type of function within that level of higher
> complexity (requiring a few thousand fairly specified amino acids at
> minimum), will be able to evolve given what a genome has to start
> with.

And how did you determine this?

> This is because, on average, what a given genome has to start
> with will be hundreds and even thousands of neutral fairly specified
> amino acid positional changes away from all other types of functions
> within that level of complexity.

IOW, your model is that evolution starts with some random sequence some
thousands of nucleotides away from any useful function and then procedes
there by a random walk. Except that you disagree and claim you are not
proposing that. That makes your position as clear as the Mississippi.

Howard Hershey

unread,
Jan 1, 2004, 11:49:49 AM1/1/04
to

Howard Hershey wrote:
>
> Sean Pitman wrote:
> >
> > howard hershey <hers...@indiana.edu> wrote in message news:<bssc8f$5sb$1...@hood.uits.indiana.edu>...
> > > Sean Pitman wrote:
> > > > howard hershey <hers...@indiana.edu> wrote in message news:<bscr3q$12r$1...@hood.uits.indiana.edu>...
> > > >
> > > >>Sean, to make a long story short, the ratio of beneficial versus
> > > >>non-beneficial you give is utterly irrelevant to anything. The
> > > >>denominator of your ratio is always *total sequence space* based on
> > > >>taking 1/20 to the power of the total number of amino acids (or minimum
> > > >>number, which, in those cases you want to be unevolvable, is always the
> > > >>same as total number of amino acids).
> > > >

[snip]


> > <snip>
> > > > I'm interested in the total number of amino acids required for a
> > > > particular type of function to be realized at its most minimum
> > > > beneficial level of function. This level is very different depending
> > > > on the type of function in question.

I am making the claim that you only use your method when you want to
demonstrate a large number.

Then let's use *your* method to analyse the probability that the ebg
genes can evolve into a beta-galactosidase.

1) You seem to agree that the native ebga does not have any selectable
lactase activity. Thus generating selectable lactase activity from ebg
is generating a 'new' function. Is that right?

2) You do agree that the native ebg involves a two peptide system, with
ebga being 1030 amino acids long and ebgc being 149 amino acids long,
both being requried for function, plus a regulatory protein (ebgr) which
is also around 1000 amino acids long.

3) You seem to think that knowing the total length of the proteins
involved (in this case, about 1200 for the two that act together at the
same time) and how many proteins are involved in the system (2) allows
you to determine the number of amino acids that are 'fairly specified'
and the 'level of complexity'. Please perform this mathematics on the
ebg system for me. If you cannot calculate "the total number of amino


acids required for a particular type of function to be realized at its

most minimum beneficial level of function" for a simple system like ebg,
what makes you think you can do so for a larger or more complex one?

4) After calculating "the total number of amino acids required for a


particular type of function to be realized at its most minimum

beneficial level of function" (you claim to be able to do so -- at least
I have seen you give estimates of around 480 or so for ebg, but it would
certainly be nice to see what went into the calculation) I want you to
calculate the odds of ebg evolving into a selectable beta-galactosidase
enzyme *based solely on these numbers* and NOT based on any other
knowledge. This estimate of the odds of generating functional lactase
activity from ebg would be called a 'prediction' of *your* 'hypothesis'.

5) Now let's take a different protein or protein system, also 1200 total
amino acids in length. We will make it a bit less complex, by making it
a single unregulated protein. But this protein is in the histidine
pathway. That is, it is a random protein wrt lactase function, chosen
merely because of total amino acids present. Let's say that for *its
function* the very same "total number of amino acids required for a


particular type of function to be realized at its most minimum

beneficial level of function" exists. I want you to calculate the odds
of this protein evolving into a selectable beta-galactosidase enzyme
*based solely on the numbers you thik are important* and NOT based on
any other knowledge about this protein.

6) Same thing, except now we have a completely random sequence of 1200
total amino acids.

I will accept failure to evolve a selectable beta-galactosidase activity
in five years as evidence that your math is correct for *that type of
protein* (even though I really should wait a gazillion years, just to be sure).

Oh. Show your math and reasoning. Now I don't want to bias you, but I
strongly suspect that by *your* method of calculating odds there should
be no difference in the odds of any of these proteins evolving a
selectable beta-galactosidase activity that *none* had to begin with
(well, actually your calculation of odds, according to the way it has
been presented here, should favor one of the last two evolving lactase
activity since they are less complex -- being a single protein rather
than two that have to work together).

That is because your bogus math does not take the specific ancestral
sequence and its pre-existing functionality into consideration.

The interesting thing, of course, is that this experiment (selection for
the evolution of galactosidase activity) actually has been run and your
model of calculating the odds has been tested. That is, the prediction
based on your methods of determining the odds of evolving a new function
has been subject to test. Did it pass the test for *all* of the
examples, or only for the examples where you start with a random protein
or a random sequence?

[snip]


> >
> > > In every system where you cannot dispute that a
> > > new function evolved (by one or a few mutational steps), you arbitrarily
> > > declare that, because this change did not require "thousands of amino
> > > acids", it does not count as whatever you think evolution involves and
> > > one does not make the same calculation with a denominator that uses the
> > > total amino acid number.
> >
> > I haven't made any sort of arbitrary declaration. As it turns out,
> > the only examples that you evolutionists have come up with as
> > "real-time" examples of evolution in action have not required anything
> > more than a few hundred loosely specified amino acids working together
> > at the same time.

How do you distinguish between "loosely specified amino acids working
together at the same time" and "fairly specified amino acids working
together at the same time"? By waving your hands and declaring them that?



> No. All examples of evolution in action involve the modification of
> pre-existing systems. That is what descent with (and by) modification
> means. It doesn't matter how many amino acids the pre-existing system has.
>
> > None of your examples of functions using thousands
> > of amino acids at the same time actually require that all of these
> > amino acids be there for minimum beneficial function of that type to
> > be realized.
>
> Well, it is damn hard to find the systems you want amidst all the other
> systems that evolve just fine. But systems that are thousands of amino
> acids long (that is the only operational definition I see you using) can
> be modified by a few mutational changes just as easily as systems less
> than 100 amino acids long.
>
> > It is my hypothesized position that the reason why your
> > examples were only one or two mutational steps away from success is
> > because the relative density of beneficial sequences at such low
> > levels of functional complexity are rather dense.

Well, now you can give us numbers and arguments to justify your
hypothesis rather than just hand wave them away.

[snip]

>
> > > > Starting with
> > > > pre-exiting proteins having pre-existing individual and collective
> > > > beneficial functions, you will not see the evolution of any new type
> > > > of protein function that requires more than a couple thousand fairly
> > > > specified amino acids working together at the same time.
> > >
> > > Why would anyone expect to see this? I cannot think of any realistic
> > > evolutionary model of any biological system that requires a couple of
> > > thousand amino acid changes?
> >
> > I'm hitting my head against a brick wall here! Come on man! Try and
> > understand what I'm saying. The system that requires a couple
> > thousand amino acids at minimum most likely does not require a couple
> > thousand amino acid changes, starting with something that is already
> > there, to be realized. However, on average, such a system would
> > probably require several hundred fairly specified amino acids position
> > changes to what is already there.

And what does this have to do with your calculation of the odds? It
seems that you think your calculation says something and then, at the
last minute, you toss it out and make a WAG about how many amino acids
need to change.

> > Then, a system that requires 10,000
> > fairly specified amino acids at minimum
>
> And where the bloody hell is there a system that *requires* 10,000
> bloody 'fairly specified' (whatever that means) amino acids? I think
> all such systems are purely imaginary.

Space for a list of all cellular systems that require 10,000 bloody
'fairly specified' amino acids plus the evidence that all 10,000 sites
are 'fairly specified'.


>
> > would probably require, on
> > average, perhaps as many as 1,000 fairly specified amino acid position
> > changes.

Is this a hand-waving WAG or what? I assume that you mean 1,000 changes
*between* positions with selectable activity of any kind, not 500
changes between selectable function A and selectable function B and
another 500 between selectable function B and selectable function C.
Just want to be clear.

[snip]


>
> > > > You can use
> > > > duplication, point mutation, translocation, frame shifts, etc., and
> > > > they will all fail to get you a new type of function that goes very
> > > > far beyond the lowest levels of functional complexity toward any new
> > > > type of function.
> > >
> > > All evolution is at the lowest levels of functional complexity.
> >
> > I couldn't have said it better myself . . .
> >
> > > All
> > > evolution involves duplication, point mutation, translocation, frame
> > > shifts, etc.
> >
> > Exactly . . .
> >
> > > > Simple up-regulation of what you already have will
> > > > only get you so far. Evolving new sequences with new functions just
> > > > doesn't happen beyond very low levels of functional complexity.
> > >
> > > One simply does not evolve "new sequences" by starting with a protein
> > > which is thousands of amino acids away from the end point before there
> > > is any selectable function. Your hypothetical system simply does not
> > > exist in nature.
> >
> > Your wrong. Such systems do exist in nature and in every living
> > thing. The average distance to simple functions requiring just 100 or
> > so loosely specified amino acids sequences may only be 3 or 4 neutral
> > amino acid changes wide.

How does one distinguish between "loosely specified sequences" and
"fairly specified sequences" without waving one's hands furiously?

> > However, those types of functions that
> > require a minimum sequence of 1,000aa are separated by much wider
> > neutral gaps from everything that a given cell has by an average of
> > say, 30 or 40 neutral positional changes (i.e., representing an
> > average neutral gap of sequence space of over 1e50 sequences).

Where is your evidence for this? It seems to me that you are
extrapolating from a mathematical calculation that would make the
evolution of ebg into a selectable lactase essentially improbable.

> > Then,
> > when you get up to those functions that require several thousand
> > fairly specified amino acids at minimum, the average gap may grow to
> > 500 or so neutral changes on average (sequence space of 1e650). Are
> > you starting to see the problem here?
>
> So where are these magical systems? And don't just *say* the bacterial
> flagella and hand-wave meaningless numbers based on your WAGs wrt the
> number of proteins and total number of amino acids. Convince me that
> any possible pathway to the current flagella requires *at least one*
> proposed step in the evolution of the bacterial flagella that *requires*
> starting with a protein that is more than 1000 *required* amino acids*
> away from the current state. Hell, more than 100 amino acids away.

Remember that no one (except your argument) is proposing that bacterial
flagella arose in one fell swoop with no intermediate states of utility.
Quite the opposite.

Actually I agree to some extent. *If* there aren't systems that are
within a few mutational steps of a new selectable function, it won't
happen. That is why states of intermediate utility (but not necessarily
the teleologic or curren utility) are considered so important in
producing evolutionary hypotheses (possible pathways). If wings had to
evolve *from nothing* with no utility at all until the organism could
soar like a vulture in the span of a hundred years, wings could not
evolve. But wings did not evolve *from nothing* (forelimb modification
was fairly important) and did not have to generate the soaring ability
of a vulture in the span of a hundred years.



> > > > Remember, the sequence space changes exponentially depending upon the
> > > > level of complexity in question.
> > >
> > > Since you are unable to quantify "level of complexity", the point is moot.
>
> What, no answer to this?

I really would like you to quantify "level of complexity".

Sean Pitman

unread,
Jan 1, 2004, 2:28:15 PM1/1/04
to
Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF3BC16...@indiana.edu>...


> I have asked this repeatedly, so now I will shout. HOW THE BLOODY HELL
> DO YOU DEFINE WHAT IS OR IS NOT 'FAIRLY SPECIFIED ORDER'
> AND WHAT VALUE DO YOU GIVE IT?

Specified order is defined by the degree of constraints required on
amino acid positions. In other words, how many positions are
completely invariant? How many more positions are partially variant
and to what degree? I have repeatedly talked about this so I am quite
surprised that you don't seem to remember reading such discussions.
Anyway, I will repeat using cytochrome c again as an example:

Based on cytochrome c an analysis of over 40 species a minimum of
around 80 amino acids are required for this type of function to be
realized with around 100 or so amino acids used on average. Of these,
30 amino acid positions that are highly constrained - rarely involving
more than one type of amino acid. An additional 36 positions only
vary between 2 or 3 different amino acids. Another 15 positions vary
between no more than 4 different amino acids. Only 4 positions out of
105 positions vary by more than 7 different amino acids. The two most
variable positions (60 and 89) vary by only 9 out of 20 possible amino
acids.

It is quite obvious that the cytochrome c type of function is rather
constrained by both its minimum amino acid requirement as well as the
fairly high degree of specificity required by the sequencing of these
amino acids. Well over 60% of this protein is restrained to within 3
amino acid options out of the 20 that are possible. That is a
significant constraint wouldn't you say? In fact the most generous
estimates of the total number of possible cytochrome c sequences in
sequence space that I have come across, based on these constraints,
suggest no more than 10e60 cytochrome c sequences exist. If you know
something different than I have suggested here, please do show me
otherwise.

Now you ask, "What value do I give such numbers?" What does this
estimate mean? Given that the total number of possible sequences in
sequence space requiring a minimum of 80 amino acids is well over
10e100, the ratio of cytochrome c sequences to non-cytochrome c
sequences is less than 1 in 10e40 sequences. This translates into an
average gap of 30 amino acid changes between various islands of
cytochrome c sequence clusters within sequence space. In other words,
if a particular organism that did not have a cytochrome c function but
would benefit from this type of function, its current proteins would
differ from the nearest potential cytochrome c sequence by an average
of over 30 amino acid positions. Don't you find that rather
significant? If I were you this fact would strike me as quite alarming
considering your belief in the validity of evolution and considering
the fact that far greater levels of specified complexity exist within
all living things.

file:///C:/Documents%20and%20Settings/Sean/My%20Documents/Evolution/References/280,46,Slide
46

> > This is not so for the lactase function.

> > Can a functional lactase enzyme to work in a beneficial manner in any


> > life form with only 80 amino acids?
>
> You keep confounding total number of amino acids with number of amino
> acids needed to perform a function.

I suggest to you that the minimum number of amino acids needed to
perform a particular type of function is indeed the total number of
amino acids needed to perform that type of function at its minimum
level of beneficial selectability. I fail to understand what you are
trying to get at here.

> > I don't think so. It seem like
> > the lactase function requires more than 400 fairly specified amino
> > acids (though less specified than in the cytochrome c function) before
> > this type of function can be realized.
>
> So, shouting again, HOW THE BLOODY HELL DID YOU CALCULATE
> A NUMBER OF 400 AMINO ACIDS? I STRONGLY SUSPECT THAT
> NUMBER IS NOTHING BUT A WAG (WILD-ASSED GUESS).

If you don't believe this number, it should be a fairly simple thing
for you to disprove this assertion - which is based on my own BLAST
database search of known lactases. Ian Musgrave also suggested, after
his own database search, that the minimum number might be as high as
480 amino acids for the most basic lactase enzyme. Now, if you can
find a lactase enzyme shorter than 400aa actually working in a living
creature I would be very glad to know of it. Until then, your
hollering isn't going to disprove my position or lessen its predictive
power. It would be much more convincing, on your part, if you were to
actually try to find evidence against the stuff that I say instead of
simply yelling out your incredulous one-liners.

> > Now, can you get the flagellar
> > motility type of function with only 400-coded amino acid positions
> > working together at the same time? I don't think so. This type of
> > function seems to require at least 4,000 to 6,000 fairly specified
> > amino acids working together at the same time in order for the bare
> > minimum level of beneficial function of this type to be realized.
>
> AND AGAIN, HOW THE BLOODY HELL DID YOU CALCULATE THAT
> NUMBER? I STRONGLY SUSPECT THAT NUMBER IS NOTHING BUT
> A WAG. Convince me otherwise.

Now this is promising. It least by making this statement you may be
starting to realize that if my assertions are true that evolution is
in big trouble.

In any case, I notice that you didn't respond when I did roughly
detail how I calculated this number below. How can you yell out this
question after having read what I wrote? I suggest to you that a more
effective response would be to try and counter what I already wrote.
Again, the calculation is based on the following:

This type of function (the flagellar system of motility) requires, at


minimum, at least 20 different kinds of proteins working together at

the same time. If each of these proteins is composed of around 300
fairly specified amino acids on average this works out to around


6,000aa collective, novel, fairly specified amino acid positions
working together at the same time for this time of function to be
realized at a minimum level of beneficial selectability.

Now there you go. All you have to do is prove that my assertions here
are wrong. Upon what basis am I way off base here? What is the
minimum number of coded amino acid positions that you would suggest in
order to realize this type of function at a minimum level of
beneficial selectability?

And again, that is all I have time for today. Hope you're having or
at least had a very happy New Year! ; )

Sean
www.naturalselection.0catch.com

Sean Pitman

unread,
Jan 1, 2004, 2:31:47 PM1/1/04
to
Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF3BC16...@indiana.edu>...


> I have asked this repeatedly, so now I will shout. HOW THE BLOODY HELL
> DO YOU DEFINE WHAT IS OR IS NOT 'FAIRLY SPECIFIED ORDER'
> AND WHAT VALUE DO YOU GIVE IT?

Specified order is defined by the degree of constraints required on


amino acid positions. In other words, how many positions are
completely invariant? How many more positions are partially variant
and to what degree? I have repeatedly talked about this so I am quite
surprised that you don't seem to remember reading such discussions.
Anyway, I will repeat using cytochrome c again as an example:

Based on cytochrome c analysis of over 40 species, a minimum of around
80 amino acids are required for this type of function to be realized

file:///C:/Documents%20and%20Settings/Sean/My%20Documents/Evolution/References/280,46,Slide
46

> > This is not so for the lactase function.
> > Can a functional lactase enzyme to work in a beneficial manner in any


> > life form with only 80 amino acids?
>
> You keep confounding total number of amino acids with number of amino
> acids needed to perform a function.

I suggest to you that the minimum number of amino acids needed to


perform a particular type of function is indeed the total number of
amino acids needed to perform that type of function at its minimum
level of beneficial selectability. I fail to understand what you are
trying to get at here.

> > I don't think so. It seem like


> > the lactase function requires more than 400 fairly specified amino
> > acids (though less specified than in the cytochrome c function) before
> > this type of function can be realized.
>
> So, shouting again, HOW THE BLOODY HELL DID YOU CALCULATE
> A NUMBER OF 400 AMINO ACIDS? I STRONGLY SUSPECT THAT
> NUMBER IS NOTHING BUT A WAG (WILD-ASSED GUESS).

If you don't believe this number, it should be a fairly simple thing


for you to disprove this assertion - which is based on my own BLAST
database search of known lactases. Ian Musgrave also suggested, after
his own database search, that the minimum number might be as high as
480 amino acids for the most basic lactase enzyme. Now, if you can
find a lactase enzyme shorter than 400aa actually working in a living
creature I would be very glad to know of it. Until then, your
hollering isn't going to disprove my position or lessen its predictive
power. It would be much more convincing, on your part, if you were to
actually try to find evidence against the stuff that I say instead of
simply yelling out your incredulous one-liners.

> > Now, can you get the flagellar


> > motility type of function with only 400-coded amino acid positions
> > working together at the same time? I don't think so. This type of
> > function seems to require at least 4,000 to 6,000 fairly specified
> > amino acids working together at the same time in order for the bare
> > minimum level of beneficial function of this type to be realized.
>
> AND AGAIN, HOW THE BLOODY HELL DID YOU CALCULATE THAT
> NUMBER? I STRONGLY SUSPECT THAT NUMBER IS NOTHING BUT
> A WAG. Convince me otherwise.

Now this is promising. It least by making this statement you may be


starting to realize that if my assertions are true that evolution is
in big trouble.

In any case, I notice that you didn't respond when I did roughly
detail how I calculated this number below. How can you yell out this
question after having read what I wrote? I suggest to you that a more
effective response would be to try and counter what I already wrote.
Again, the calculation is based on the following:

This type of function (the flagellar system of motility) requires, at


minimum, at least 20 different kinds of proteins working together at

the same time. If each of these proteins is composed of around 300
fairly specified amino acids on average this works out to around


6,000aa collective, novel, fairly specified amino acid positions
working together at the same time for this time of function to be
realized at a minimum level of beneficial selectability.

Now there you go. All you have to do is prove that my assertions here

Howard Hershey

unread,
Jan 2, 2004, 9:22:43 AM1/2/04
to

Sean Pitman wrote:
>
> Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF3BC16...@indiana.edu>...
>
> > I have asked this repeatedly, so now I will shout. HOW THE BLOODY HELL
> > DO YOU DEFINE WHAT IS OR IS NOT 'FAIRLY SPECIFIED ORDER'
> > AND WHAT VALUE DO YOU GIVE IT?
>
> Specified order is defined by the degree of constraints required on
> amino acid positions. In other words, how many positions are
> completely invariant? How many more positions are partially variant
> and to what degree? I have repeatedly talked about this so I am quite
> surprised that you don't seem to remember reading such discussions.
> Anyway, I will repeat using cytochrome c again as an example:
>
> Based on cytochrome c analysis of over 40 species, a minimum of around
> 80 amino acids are required for this type of function to be realized
> with around 100 or so amino acids used on average.

What do you mean by "required" in this context? That they must be
invariant? That they must be hydrophobic? That they must *not* be
proline? Again, you repeatedly fail to define 'fairly specified' and
how you determine that an amino acid is 'fairly specified'.

> Of these, 30 amino
> acid positions that are highly constrained - rarely involving more
> than one type of amino acid.

Rarely is not never. How many are invariant? It seems to me that you
count an amino acid as 'fairly specified' if it exhibits *any*
constraint and treat it as if it were invariant. Actually, I haven't
seen you calculate anything. All I have seen you do is say that there
are positions that are very highly constrained, positions that are
somewhat less constrained, and positions that are even less constrained.
Then you wave your hands and come up with a number.

Again, it is well known that all *modern* cytochrome c's have a high
percentage of evolutionarily constrained sequences. That is because
cytochrome c is small and most of its amino acids are in contact with
the substrate. The high degree of evolutionary constraint is what makes
this sequence particularly useful for analyzing deep phylogeny. The
same is true for histones, for much the same reason. But you typically,
and misleadingly, apply these percentages of evolutionary constraint to
large molecules that show a much, much lower amount of evolutionary
constraint. According to your logic, the small fibrinogen peptide
sequence should be highly constrained as well, as it is part of a very
large fibrin protein.

> An additional 36 positions only vary
> between 2 or 3 different amino acids. Another 15 positions vary
> between no more than 4 different amino acids. Only 4 positions out of
> 105 positions vary by more than 7 different amino acids. The two most
> variable positions (60 and 89) vary by only 9 out of 20 possible amino
> acids.

Yes. I have no problem with the idea that cytochrome c has a higher
degree of evolutionary constraint than other proteins. There is
definitely evidence for that. But remember that you are only looking at
40 sequences (and are probably overweighted in sequences in metazoan
animals that diverged recently). Positions that vary by more than 7
amino acids are *not* highly constrained, when you are only looking at
40 sequences, given that, even assuming a random sampling, you wouldn't
expect to see all 20 amino acids at any single position when you are
only examining 40 sequences.

More importantly, however, there is no way you can use these numbers
from any *modern* protein or system to say *anything* about how
difficult or easy it was to evolve that system. The numbers are
completely irrelevant. The *only* model of evolution that you can use
these numbers to test is the model your math tells us you are testing.
Specifically, these numbers can tell us only the odds of the modern
system evolving from the starting point of a random sequence by a
process of a complete random walk with no possible intermediate states
of utility. Since no one proposes that any *modern* protein or system,
with rare exception like the nylonase case, evolves this way, your
numbers are irrelevant *even if* they are correct.


>
> It is quite obvious that the cytochrome c type of function is rather
> constrained by both its minimum amino acid requirement

You *still* haven't told me how you calculated "minimum amino acid
requirement" besides say that some positions in cytochrome c are
strongly conserved, others less conserved, and others still less. Then
we get a wave of your hand and a number.

> as well as the
> fairly high degree of specificity required by the sequencing of these
> amino acids. Well over 60% of this protein is restrained to within 3
> amino acid options out of the 20 that are possible.

The average time for a 100% probability of replacement of a single
selectively neutral amino acid is 100 million years, Sean. If two of
the sequences you examine were eutherian mammals (which separated some
60 million years ago), that would mean that 40% of their sequence would
be identical *EVEN IF* every amino acid in their cytochrome c were
selectively neutral and free to drift. And it would take around 300
million years for all the possible amino acids due to neutral selection
to be reached from an ancestral sequence *by chance alone* (selection,
of course, works much, much faster -- indeed, in certain environments
one generation is enough time) because you would need mutation at more
than one site in a codon. Have you taken this into account at all? I
sure can't tell, because all you do is make a series of statements and
then wave your hand to come up with a number. Not that it really
matters, since even if the number were correct, it would be irrelevant.

> That is a
> significant constraint wouldn't you say? In fact the most generous
> estimates of the total number of possible cytochrome c sequences in
> sequence space that I have come across, based on these constraints,
> suggest no more than 10e60 cytochrome c sequences exist. If you know
> something different than I have suggested here, please do show me
> otherwise.

That certainly is a large number of potential different cytochrome c
sequences (that is, sequences with quite significant cytochrome c
activity). And the question is, of what possible relevance is that
knowledge to what you are saying wrt the impossibility of evolution?
Unless, of course, your logic is that cytochrome c sequences arise by a
completely random process from a random sequence?



> Now you ask, "What value do I give such numbers?" What does this
> estimate mean? Given that the total number of possible sequences in
> sequence space requiring a minimum of 80 amino acids is well over
> 10e100, the ratio of cytochrome c sequences to non-cytochrome c
> sequences is less than 1 in 10e40 sequences.

SFW? You keep coming up with this bogus ratio that presumes that
cytochrome c, or whatever protein or system one is talking about, is
derived by starting from a random sequence and generating the final
result by a completely random walk with no possible states of
intermediate utility. And then you turn around and deny that that is
what you are saying.

> This translates into an
> average gap of 30 amino acid changes between various islands of
> cytochrome c sequence clusters within sequence space.

And how is this "average gap" calculated from those two numbers? Be
precise. And is that or is that not the "average gap" between a
*random* protein or a *random* sequence and some sequence that has
cytochrome c activity, just as all your other calculations are based on
putative gaps between *random* proteins and *random* sequences and some
teleologically determined end point with the proviso that only a random
walk with no functional intermediates is possbile between the two states.

Of course, as my little exercise with ebg shows, evolution does not work
by starting with a *random* protein or a *random* sequence. And all
that matters is the number of mutational steps required to generate a
selectable activity even if that activity is not the end activity in
question. IOW, your numbers are totally irrelevant. Not *even* wrong.
But as useful as knowing the distance between earth and the furthest
galaxy is to determining how one gets from L.A. to San Francisco.
Utterly irrelevant.

> In other words,
> if a particular organism that did not have a cytochrome c function but
> would benefit from this type of function, its current proteins would
> differ from the nearest potential cytochrome c sequence by an average
> of over 30 amino acid positions.

Evolution would not *start* with the average protein. It would start
with the extreme end of the bell-shaped distribution of proteins that is
only 1 or 2 amino acids away from a selectable function. In the case of
cytochrome c, it would start with a protein that already has an affinity
for heme (there are other heme-binding proteins) that, at least some of
the time restricted the direction of electron transfer to the ends, and
subsequently select for variants that more frequently and with greater
stability covered surfaces to direct electron transfer to the ends, with
selection favoring nearly every step of this process. That process
would quickly (but only on a geological timescale, not a human one)
reach an 'optimal' state. Subsequent change would produce (and be
largely limited to) the selectively neutral differences we see in all
the modern cytochrome c's. Unlike the changes due to selection, these
selectively neutral differences occur slowly on a geological timescale.

> Don't you find that rather
> significant? If I were you this fact would strike me as quite alarming
> considering your belief in the validity of evolution and considering
> the fact that far greater levels of specified complexity exist within
> all living things.

I consider your "fact" as 1) unsubstantiated even in its own terms, 2)
irrelevant even if correct. Nothing you have said here either makes
your statements substantiated, nor relevant if substantiated.


>
> file:///C:/Documents%20and%20Settings/Sean/My%20Documents/Evolution/References/280,46,Slide
> 46
>
> > > This is not so for the lactase function.
> > > Can a functional lactase enzyme to work in a beneficial manner in any
> > > life form with only 80 amino acids?

80 total amino acids or 80 amino acids involved in the hydrolysis of a
particular glycoside? The number of amino acids involved directly in
the hydrolysis is quite small. The number of amino acids involved in
binding a particular sugar (galactose rather than, say, glucose) is also
small and different from and independent of the amino acids involved in
the hydrolysis reaction. Yet your argument is that one cannot change
just, say, the sugar binding site of a previously existing protein but
must change *all* the amino acids to invent all these 'functions' from a
random sequence. And, as I have pointed out, no one thinks that lactase
of any kind arose from a *random* protein or a *random* sequence.
Rather, it arose by modification of a pre-existing protein that already
hydrolysed glycoside linkages. Any calculation of the odds of lactase
arising from some average or *random* protein or sequence is simply
irrelevant, whether the number of amino acids involved in the active
sites and binding sites is 80, 480, 1030, or 3.

> > You keep confounding total number of amino acids with number of amino
> > acids needed to perform a function.
>
> I suggest to you that the minimum number of amino acids needed to
> perform a particular type of function is indeed the total number of
> amino acids needed to perform that type of function at its minimum
> level of beneficial selectability. I fail to understand what you are
> trying to get at here.

I am trying to see how you actually determine "minimum number of amino
acids needed to perform a particular type of function". You keep
presenting these hand-waving numbers that appear to be nothing more than
WAGs. Not, of course, that knowing this would make your argument, based
as it is on the idea that evolution works by starting with a *random*
sequence and procedes by a *random* walk. But it is an interesting
point in its own right.


>
> > > I don't think so. It seem like
> > > the lactase function requires more than 400 fairly specified amino
> > > acids (though less specified than in the cytochrome c function) before
> > > this type of function can be realized.
> >
> > So, shouting again, HOW THE BLOODY HELL DID YOU CALCULATE
> > A NUMBER OF 400 AMINO ACIDS? I STRONGLY SUSPECT THAT
> > NUMBER IS NOTHING BUT A WAG (WILD-ASSED GUESS).
>
> If you don't believe this number, it should be a fairly simple thing
> for you to disprove this assertion - which is based on my own BLAST
> database search of known lactases.

That is insufficient information. *HOW* did you use the BLAST database
to determine this number?

> Ian Musgrave also suggested, after
> his own database search, that the minimum number might be as high as
> 480 amino acids for the most basic lactase enzyme. Now, if you can
> find a lactase enzyme shorter than 400aa actually working in a living
> creature I would be very glad to know of it.

HOW was this determined? Stop beating around the bush. If your number
is nothing but a WAG based on very little analysis, tell me. If you
actually did some analysis tell me what you did. I am not saying that
the number is wrong or right (although it certainly is irrelevant). I
just want to know HOW THE BLOODY HELL YOU ARRIVED AT THAT NUMBER IN A
WAY THAT WAS NOT JUST A WAG. What assumptions went into your
'calculations'? How did you arrive at the numbers you did?

> Until then, your
> hollering isn't going to disprove my position or lessen its predictive
> power. It would be much more convincing, on your part, if you were to
> actually try to find evidence against the stuff that I say instead of
> simply yelling out your incredulous one-liners.

How can I even try to find evidence against what you say when all I have
is a bunch of statements about constrained sites and then a hand-wave
and a final number? Until you tell me the apparently secret formula you
used for calculating the number you come up with, all I have is this
irrelevant hand-wave number that you *assert* without evidence is
somehow meaningful. Once you tell me what assumptions and calculations
went into generating these numbers, I can probably tell you where you
went wrong -- most likely by taking a number and making it more
meaningful than it really is or by taking any amino acid site you can
conceivably say shows some sort of 'constraint' and treating it like it
needed to be invariant.

Again, even if the number were correct, it would be irrelevant to any
model of evolution except the one where you start with a completely
random sequence and procede by a functionless random walk. But it would
be nice to be able to say that the number was correct or incorrect. I
cannot do that until you tell me how it was determined.


>
> > > Now, can you get the flagellar
> > > motility type of function with only 400-coded amino acid positions
> > > working together at the same time? I don't think so. This type of
> > > function seems to require at least 4,000 to 6,000 fairly specified
> > > amino acids working together at the same time in order for the bare
> > > minimum level of beneficial function of this type to be realized.
> >
> > AND AGAIN, HOW THE BLOODY HELL DID YOU CALCULATE THAT
> > NUMBER? I STRONGLY SUSPECT THAT NUMBER IS NOTHING BUT
> > A WAG. Convince me otherwise.
>
> Now this is promising. It least by making this statement you may be
> starting to realize that if my assertions are true that evolution is
> in big trouble.
>
> In any case, I notice that you didn't respond when I did roughly
> detail how I calculated this number below. How can you yell out this
> question after having read what I wrote? I suggest to you that a more
> effective response would be to try and counter what I already wrote.
> Again, the calculation is based on the following:
>
> This type of function (the flagellar system of motility) requires, at
> minimum, at least 20 different kinds of proteins working together at
> the same time. If each of these proteins is composed of around 300
> fairly specified amino acids on average this works out to around
> 6,000aa collective, novel, fairly specified amino acid positions
> working together at the same time for this time of function to be
> realized at a minimum level of beneficial selectability.

I didn't respond because it is so damn vague. I know that the flagella
has 20+ different proteins. It is up to you to specify which proteins
are to be included in 'the system'. I do not know that these proteins
are all *working* together *at the same time*. Most of them are simply
attached to other proteins and do nothing at all but provide a structure
that gets moved around by the independent actions of other proteins.
Much of the movement is induced by only a few of the proteins, and they
have little or no contact with most of the other proteins. Yet other
proteins involved in flagella production are involved as scaffolding in
the construction and are not present in the final product. So you
really do need to name names and tell me exactly what you mean by
"working together at the same time". And you certainly have not
presented any justification at all that each of these proteins involve
300 'fairly specified amino acids on average' nor presented any detailed
analysis of how one can determine this.

And, as I have repeatedly pointed out, even *if* you could do that, you
have yet to convince me that anyone thinks that the bacterial flagella
arose from a whole set of 20 *average* or *random* proteins or 20
different *average* or *random* sequences by a random walk with no
possibility of intermediate utility. That is compared to the usual
evolutionary alternative mechanism of the flagella evolving stepwise
using decidedly non-average, non-random proteins which have functional
relevance to each step in the process and in which each step produces
intermediate structures that have independent functional and selectable utility.

After all, the evolution of lactase activity in E. coli when its
original lacZ is deleted occurs, not by a random walk from a random
protein or a random sequence, but by a small selectable step in a
*specific* pre-existing protein that is only one mutation away from
having some selectable lactase activity. All your calculations based on
minimal number of amino acids and complexity are utterly irrelevant in
that case. All that counted was the existence of a specific protein
without lactase activity, but with the potential to be modified to have
that activity, i.e, the number of mutational steps required to get
activity in the case where evolution *did* occur was completely
unrelated to numbers you would calculate for lactase activity. It is
likely also irrelevant in all other cases of evolution as well.



> Now there you go. All you have to do is prove that my assertions here
> are wrong.

Like I say, whether they are wrong or right in their own terms doesn't
really matter. They would still be irrelevant. And I can neither agree
with nor disprove your assertions (your WAG numbers) until you tell me,
very explicitly, how they were arrived at.

> Upon what basis am I way off base here?

The basis where you assume, in your calculations, that evolution works
by starting with a random protein or a random sequence and procedes by a
random walk with no possible intermediate utility. Other than that, I
cannot argue whether your numbers are right or wrong because you never
tell me how you went about generating them, what assumptions you make
and how you dealt with complexities like the rate of neutral drift. You
just make assertions and then poof the number out as an unevidenced WAG.
And then pretend it is my fault that I cannot read your mind wrt how you
came up with that number.

> What is the
> minimum number of coded amino acid positions that you would suggest in
> order to realize this type of function at a minimum level of
> beneficial selectability?

I wouldn't start with a random sequence or random protein and procede by
a random walk with no possible intermediate utility. Evolution doesn't.
So why should I if I want to say something about how evolution really works?

RobinGoodfellow

unread,
Jan 3, 2004, 4:06:24 AM1/3/04
to
seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.03123...@posting.google.com>...

Good gravy! That was so wrong, it feels wrong to even use the word
"wrong" to describe it. All I can recommend is that you run, don't
walk, to your nearest college or university, and sign up as quickly as
you can for a few math and/or statistics courses: I especially
recommend courses in probability theory and stochastic modelling.
With all due respect, Sean, I am beginning to see why the biologists
and biochemists in this group are so frustrated with you: my
background in those fields is fairly weak - enough to find your
arguments unconvincing but not necessarily ridiculous - but if you are
as weak with biochemistry as you are with statistical and
computational problems, then I can see why knowledgeable people in
those areas would cringe at your posts.

I'll try to address some of the mistakes you've made below, though I
doubt that I can do much to dispel your misconceptions. Much of my
reply will not even concern evolution in a real sense, since I wish to
highlight and address the mathematical errors that you are making.

> RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...

> > It is even worse than that. Even random walks starting at random points
> > in N-dimensional space can, in theory, be used to sample the states
> > with a desired property X (such as Sean's "beneficial sequences"), even
> > if the number of such states is exponentially small compared to the
> > total state space size.
>
> This depends upon just how exponentially small the number of
> beneficial states is relative to the state space.

No, it does not. If you take away anything from this discussion, it
has to be this: the relative number of beneficial states has virtually
no bearing on the amount of time a local search algorithm will need to
find such a state. The things that *would* matter are the
distribution of beneficial states through the state space, the types
of steps the local search is allowed to take (and the probabilities
associated with each step), and the starting point. For an extreme
example, consider a space of strings consisting of length 1000, where
each position can be occupied by one of 10 possible characters.
Suppose there are only two beneficial strings: ABC........, and
BBC........ (where the dots correspond to the same characters). The
allowed transitions between states are point mutations, that are
equally probable for each position and each character from the
alphabet. Suppose, furthermore, that we start at the beneficial state
ABC. Then, the probability of a transition from ABC... to BBC... in a
single mutation 1/(10*1000) = 1/10000 (assuming self-loops - i.e.
mutations that do not alter the string, are allowed). Thus, a random
walk that restarts each time after the first step (or alternatively, a
random walk performed by a large population of sequences, each
starting at state ABC...) is expected to explore, on average, 10000
states before finding the next beneficial sequence. Now, below, we
will apply your model to the same problem.

> It also depends
> upon how fast this space is searched through. For example, if the
> ratio of beneficial states to non-beneficial states is as high as say,
> 1 in a 1e12, and if 1e9 states are searched each second, how long with
> it take, on average, to find a new beneficial state?

OK. Let's take my example, instead, and apply your calculations.
There are only 2 beneficial sequences, out of the state space of
1e1000 sequences. Since the ratio of beneficial sequences to
non-beneficial ones is (2/10^1000), if your "statistics" are correct,
then I should be exploring 10^1000/2 states, on average, before
finding the next beneficial state. That is a huge, huge, huge number.
So why does my very simple random walk explore only 10,000 states,
when the ratio of beneficial sequences is so small?

The answer is simple - the ratio of beneficial states does NOT matter!
All that matters is their distribution, and how well a particular
random walk is suited to explore this distribution. (Again, it is a
gross, meaningless over-simplification to model evolution as a random
walk over a frozen N-dimensional sequence space, but my point is that
your calculations are wrong even for that relatively simple model.)

> It will take
> just over 1,000 seconds - a bit less than 20 minutes on average. But,
> what happens if at higher levels of functional complexity the density
> of beneficial functions decreases exponentially with each step up the
> ladder? The rate of search stays the same, but the junk sequences
> increase exponentially and so the time required to find the rarer and
> rarer beneficial states also increases exponentially.

The above is only true if you use the following search algorithm:
1. Generate a completely random N-character sequence
2. If the sequence is beneficial, say "OK";
Otherwise, go to step 1.

For an alphabet of size S, where only k characters are "beneficial"
for
each position, the above search algorithm will indeed need to explore
exponentially many states in N (on average, (S/k)^N), before finding a
beneficial state. But, this analysis applies only to the above search
algorithm - an exteremely naive approach that resembles nothing that
is going on in nature. The above algorithm isn't even a random walk
per se, since random walks make local modifications to the current
state, rather than generate entire states anew. A random walk
starting at a given beneficial sequence, and allowing certain
transitions from one sequence to another, would require a completely
different type of analysis. In the analyses of most such search
algorithms, the "ratio" of beneficial sequences would be irrelevant -
it is their *distribution* that would determine how well such an
algorithm would perform. My example above demonstrates a problem
where the ratio of beneficial states is exteremely tiny, yet the
search finds a new beneficial state relatively quickly. I could also
very easily construct an example where the ratio is nearly one, yet a
random walk starting at a given beneficial sequence would stall with a
very high probability. In other words, Sean, your calculations are
irrelevant for the kind of problem you are trying to analyze. If you
wish to model evolution as a random walk of point mutations on a
frozen N-dimensional sequence space, you will need to apply a totally
different statististical analysis: one that takes into account the
distributions of known "beneficial" sequences in sequence space. And
then I'll tell you why that model too is so wrong as to be totally
irrelevant.

> > Such random walks are at the heart of
> > Monte-Carlo methods, used to solve a wide variety of problems in
> > physics, statistics, computer science, etc. The time requirements
> > for such a random walk would depend on the distribution of valid states
> > (i.e. "beneficial sequences") in the space, the transition probabilities
> > between each state, and, to a lesser extent, the starting point.
>
> Exactly. And the "beneficial sequences" (i.e., the density of
> beneficial sequences) is inversely related, in an exponential manner,
> to the level of minimum informational complexity required for these
> functions to work at a minimum level of beneficial function.

Already addressed above. By the way, you still owe me a working
definition of "informational complexity". Is it related to the total
number of amino acids in all the proteins of a system? The number of
amino acids at the active sites? The amount of genomic information
needed to code for those amino acids? And what exactly is a "minimum
level of beneficial of function"? Would such a level remain
invariate, or would it change depending on an organism's environment
(e.g. for the lactase function, do you think this "minimum level"
would be the same in lactose rich and lactose poor environments?)
Finally, how do we determine that a certain level of complexity is
*required* to achieve this "minimum level of beneficial function" (as
opposed to simply observing that systems with such and such level of
complexity happen to perform this function)?

> > Of
> > course, the size of each state (i.e. the dimension of the space) is also
> > a factor, but the key point (and what makes Monte-Carlo techniques so
> > useful) is that the relationship between time and state size need not be
> > exponential. Depending on the specific details described above, the
> > time requirement may be any function of state size - possibly even a
> > linear function!
>
> No, go and check these formulas again and then show me how they are
> "linear" with increasing minimum state sizes. They are not linear,
> but exponential relationships.

Really, Sean? Well, in the toy example I gave above, the number of
states searched is exactly linear in the state size. Specifically,
for sequence of length N, 10*N states must be searched before the new
beneficial state is found. Which, again, reinforces my point that
the relative number of "beneficial states" doesn't matter: their
distribution does. Your "formulas" are irrelevant.

All in all, do you really think that Monte-Carlo random-walk
procedures would be used in practice if their expected running times
were exponential in the size of the problem? Exponential functions do
not scale very well at all: that is why such procedures are used in
the first place to search through exponentially large state spaces for
an exponentially small number of solutions in *sub-exponential* time.
Perhaps you should contact all the researchers relying on these
methods day-to-day, and tell them to stop wasting their time because
these methods don't work - despite the countless times that they have,
and the theoretical analysis demonstrating that they do? Or, as a
better idea, perhaps you should realize that statistical problems
usually require much more sophisticated answers than raising 20 to the
N-th power, and go learn something?

> However, even if they actually were
> linear as you suggest, this would still pose a significant problem to
> evolution beyond a certain point of informational complexity. Even a
> linear decrease in density with increasing minimum space size would
> result in a linear increase in required time to find new functions at
> that level of complexity.

Thank you for a hearty laugh! For further amusement, I would really
love to see your calculations to back up this claim. I am tempted to
nominate it for a Chez Watt, but I don't know how many computer
scientists read this forum. But please, enlighten me. What would
this level of complexity be? And why do you think such systems exist
in nature?

This is all the time I have for now. I'll try to get back to the rest
of your post within the next day or two, but for now I'd like to leave
you with an admonishment. It is clear that your background in some of
the areas where you are arguing, is, to put it mildly, strenuous.
However, it is also clear that you are an intelligent person, and at
least to an extent, curious about the
world. Don't you think that you owe it to yourself to go out and
learn something about the subjects you wish to argue, so as at least
to appear credible when presenting your arguments to individuals with
a certain amount of in-depth knowledge in these fields? So far you
have failed to win such credibility for yourself, which is a shame,
since your intetions appear to be honest. Perhaps taking the time to
revise, adjust, or possibly retract some of your claims would be
taking a step in the right direction?

Cheers,
RobinGoodfellow.

[snip rest for now]

Sean Pitman

unread,
Jan 3, 2004, 6:18:21 AM1/3/04
to
Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF4957C...@indiana.edu>...


> 1) You seem to agree that the native ebga does not have any selectable
> lactase activity. Thus generating selectable lactase activity from ebg
> is generating a 'new' function. Is that right?

Yes.

> 2) You do agree that the native ebg involves a two peptide system, with
> ebga being 1030 amino acids long and ebgc being 149 amino acids long,
> both being requried for function, plus a regulatory protein (ebgr) which
> is also around 1000 amino acids long.

At this point it might be helpful to consider that the usual wild type
lacZ genes in E. coli produce a tetramer beta-galactosidase. Each
subunit of this tetramer is around 1000aa in size. However, this is
not the minimum size requirement for this type of function to be
realized at a beneficial level of selectability. The minimum size
requirement seems to be well over 400aa. Considering that 12 to 14 of
15 active site residues are identical between LacZ and ebgA, I would
also think that the minimum sequence requirements would also be
similar (i.e., somewhere around 400aa). Also, it is interesting to
note that the ebgC sequence has none of the active site residues and
yet it seems to be essential, as you noted yourself, for the lactase
function. It appears that this small subunit is essential for the
optimal operation of electrophilic catalysis by the active-site Mg^2+.
Also note that a correct mutation in either the ebgR or the ebgA
genes alone will allow selectably advantageous lactase ability. Of
course both mutations occurring at the same time allow for a much
stronger lactase function, but both mutations are not required before
selectable lactase function can be realized. It is known that the
mutation in ebgR arises first that allows the cells to grow very
slowly on lactulose. The second mutation (in the ebgA gene) then
arises and allows the double mutants to grow very rapidly.

http://www.biochemj.org/bj/325/0117/3250117.pdf
http://www.science.siu.edu/microbiology/micr460/460%20Pages/460.SPAM.html

> 3) You seem to think that knowing the total length of the proteins
> involved (in this case, about 1200 for the two that act together at the
> same time) and how many proteins are involved in the system (2) allows
> you to determine the number of amino acids that are 'fairly specified'
> and the 'level of complexity'.

As I have said many many times before, I am interested in knowing the
*minimum* number and specificity of amino acids required to achieve a
particular type of function. I dare say that the 1200aa normally used
in this case are not all needed are and are not all that constrained.
As explained already, a more likely minimum number of required amino
acids is probably somewhere around 400 relatively loosely specified
amino acids.

> Please perform this mathematics on the
> ebg system for me. If you cannot calculate "the total number of amino
> acids required for a particular type of function to be realized at its
> most minimum beneficial level of function" for a simple system like ebg,
> what makes you think you can do so for a larger or more complex one?

But I can. I suggest to you that the type of function produced by the
ebg system has a very similar minimum size requirement and positional
constraint limits as do other lactase genes/systems which seem to have
a minimum requirement of somewhere over 400 relatively loosely
specified amino acids.

Now, you can easily prove me wrong here by finding a functional
lactase enzyme that requires less than 400aa. Do you know of such a
lactase that actually works to some selectable advantage in any living
thing?

> 4) After calculating "the total number of amino acids required for a


> particular type of function to be realized at its most minimum

> beneficial level of function" (you claim to be able to do so -- at least
> I have seen you give estimates of around 480 or so for ebg, but it would
> certainly be nice to see what went into the calculation)

This calculation is based my own database search and the searches of
others that suggest that there are no functional lactase enzymes
smaller than 400aa. So, it seems like the ~400aa level is the best
"minimum" requirement that the evidence available to me so far
supports. If you think otherwise, please do present this evidence.

>I want you to
> calculate the odds of ebg evolving into a selectable beta-galactosidase
> enzyme *based solely on these numbers* and NOT based on any other
> knowledge. This estimate of the odds of generating functional lactase
> activity from ebg would be called a 'prediction' of *your* 'hypothesis'.

The odds are extremely good that the wild-type ebg sequence will
evolve into a selectable beta-galactosidase in short order (one or two
generations) since it is only a single positional change away from
success, but that is not the important question. My idea doesn't look
at sequences so much as it looks at types of functions. What are the
odds that a particular organism or group of organisms will have
anything within their collective genomes that is close enough to
evolve any type of new beneficial function within a given level of
specified complexity? That is the important question.

Given this question, it is very interesting that the E. coli bacterial
species seems to have a "spare tire" lactase gene that is just one
mutation away from success. This would not be such an interesting
finding if lactases where less specified than they are. For example
if the density of lactase sequences in 400aa level of sequence space
were say as high as 1 in a billion, the average gap between lactases
would be less than 7 mutations wide. For a colony of bacteria
numbering say 10 billion individuals, this gap would be crossed in no
more than several months by all types of bacteria. What is
interesting though is the very "limited evolutionary potential" that
many types of bacteria have when it comes to the evolution of this
relatively simple enzymatic function. Without their spare tire gene,
E. coli cannot evolve this lactase function despite very positive
selection pressure, artificially elevated mutation rates, and tens of
thousands of generations of time. Many other types of bacteria have
not been able to evolve this relatively simple lactase function
despite well over a million generations of documented observation.

So, what does this mean? It means that the density of sequences with
the lactase function is actually quite low. This low density is what
limits the evolutionary potential of many organisms that would
otherwise benefit from a lactase enzyme if they were able to evolve
one. The fact that they do not evolve one means that the gap between
what they have and the nearest lactase enzyme is simply more than a
dozen fairly specified mutations away.

> 5) Now let's take a different protein or protein system, also 1200 total
> amino acids in length. We will make it a bit less complex, by making it
> a single unregulated protein.

The fact that a protein operates as a single unit does not make it
less complex than a multiprotein function that requires the same
minimum amino acid number and level of specificity. Also, all protein
functions are regulated in one form or another.

> But this protein is in the histidine
> pathway. That is, it is a random protein wrt lactase function, chosen
> merely because of total amino acids present. Let's say that for *its

> function* the very same "total number of amino acids required for a


> particular type of function to be realized at its most minimum

> beneficial level of function" exists. I want you to calculate the odds
> of this protein evolving into a selectable beta-galactosidase enzyme
> *based solely on the numbers you thik are important* and NOT based on
> any other knowledge about this protein.

Starting with a random sequence of 1,200aa acting in some beneficial
manner, you are asking how long it would take to evolve a
beta-galactosidase? Is that what you are asking? If so, then say the
density of lactases in sequence space of 400aa minimum was low enough
to require 24 specified mutations, on average, to go from one lactase
island to another. If true, then, on average, a sequence of 400aa in
a given gene pool would be around 12 specified mutations away from the
closest lactase sequence creating a gap of 4,000 trillion non-lactase
sequences. Say the colony size is 1 trillion individuals living in a
steady state and the mutation rate is one mutation per 400aa per year
per individual lineage (an pretty high mutation rate). Well, starting
with 1,200aa in a colony of 1 trillion would give us 3 trillion
sequences of 400aa each evolving at the same time (given that this
1,200aa sequence was released from selective constraints perhaps via
gene duplication). This means that each year 3 trillion sequences out
of 4,000 trillion will be searched out. At this rate, on average,
success will be realized in just over 1,300 years on average (defined
as the evolution of a beneficial lactase function in one member of the
population).

> 6) Same thing, except now we have a completely random sequence of 1200
> total amino acids.
>
> I will accept failure to evolve a selectable beta-galactosidase activity
> in five years as evidence that your math is correct for *that type of
> protein* (even though I really should wait a gazillion years, just to be sure).

Ok, what is your counter argument? If the density of lactases in
sequence space of 400aa was very much less than what I based my above
calculations on, then why were Hall's E. coli so limited in their
ability to evolve a type of function with such a high density of
sequences in sequence space?



> That is because your bogus math does not take the specific ancestral
> sequence and its pre-existing functionality into consideration.

Actually it does. No matter what you start with you cannot get around
the fact that on average your starting points will be a certain
distance from new sequences with new types of functions. This
distance gets exponentially larger, no matter what your starting
sequences are, at higher and higher levels of specified complexity.

For example, lets just say, by a sheer extraordinary stroke of luck
that an ancestral sequence in a bacterial colony just happened to be
one or two mutations away from a new type of function as specified and
complex as a flagellar motility system. Well, of course this highly
complex system would evolve in short order now wouldn't it? Ok, but
how many more such systems would it be able to evolve on average? How
long would it take that colony to evolve another type of function at
that same level of complexity or higher given what it now has to
proceed with? Odds are that everything it has will be gazillions of
years away from any other type of function within such a level of
complexity or higher. In fact, the odds are so great against
evolution at such levels that the witnessing of evolution at such a
level should cause one to seriously look into the almost certain
finding of a pre-exiting system that had been lost for a time but
who's code was still there pretty much intact.

For example, cavefish who have lost their eyes still have the code to
make eyes in their genome. It has been shown that a single point
mutation can restore the production of fully formed eyes in the
offspring of these fish. Does this mean that eye evolution has been
demonstrated? Absolutely not. All this shows is that the evolution
of such a highly complex system requires a pre-existing code for this
system that has been shut down by a slight change to the system.
Without this historical existence of sightedness in the ancestors of
these fish, they would never have been able to evolve the ability to
see. The same is true for flagellar motility. I know that you have
suggested that the ability to mutate a flagellar system so that it no
longer works as a motility system, just keeping its TTSS system
intact, and then mutating back the motility function is an example of
high complexity evolution in action. It really is nothing of the
sort. It is on the same level as blind cavefish evolving their eyes
back again. Without the pre-established code already being there and
working at that type and level of functional complexity in the
ancestors of that organism, such levels of complex function would not
evolve in trillions upon trillions of years.

> The interesting thing, of course, is that this experiment (selection for
> the evolution of galactosidase activity) actually has been run and your
> model of calculating the odds has been tested. That is, the prediction
> based on your methods of determining the odds of evolving a new function
> has been subject to test. Did it pass the test for *all* of the

> examples, or only for the examples where you start with a random protein
> or a random sequence?

You must understand that we are talking averages here. What is the
average time required to evolve a new function at a particular level
of complexity? In other words, what is the density of beneficial
functions at various levels of complexity in sequence space? You must
have some sort of idea of the density of beneficial functions in order
to be able to estimate average evolutionary time requirements.
Certainly it was very fortunate that E. coli had at least one and
possibly two sequences within striking distance of a lactase sequence,
but this does not mean that the density of lactase functions can be
adequately estimated based the division of the number of genes in E.
coli by the number of lactase sequences in E. coli. This is a
fallacy. By this method it would seem that the density of lactases
sequences is as high as 1 in 1000 amino acid sequences. This is
obviously incorrect or a beneficial lactase would be no more than 3
mutations away from any 400aa sequence. The evolution of lactase
would be lightening fast at this density in all types of bacteria.
The far more telling evidence is found in the limited ability of lacZ
and ebg negative bacteria to evolve the lactase function over the
course of tens of thousands of generations. That observation gives a
much clearer idea about just how low the density of lactase sequences
really is.



> Remember that no one (except your argument) is proposing that bacterial
> flagella arose in one fell swoop with no intermediate states of utility.
> Quite the opposite.

Obviously that is what you evolutionists fervently believe and is
actually what is required. You must have intermediate steppingstones
that are each selectably beneficial. Your problem is that you don't
have these stones. Your proposed evolutionary pathways for the
flagellar motility system are sorely lacking involving huge gaps that
have never been crossed. Not even one of your proposed steps in the
evolution of a flagellar motility system has been demonstrated to
evolve in real life - not one. Come on now, where are these
steppingstone functions? They just aren't there because those types
of functions at this level of specified complexity are so far away
from all stepping stones that universes of sequences must be sorted
through before any function at this level can be found.

_________
Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF5C491...@indiana.edu>...



> > > I have asked this repeatedly, so now I will shout. HOW THE BLOODY HELL
> > > DO YOU DEFINE WHAT IS OR IS NOT 'FAIRLY SPECIFIED ORDER'
> > > AND WHAT VALUE DO YOU GIVE IT?
> >
> > Specified order is defined by the degree of constraints required on
> > amino acid positions. In other words, how many positions are
> > completely invariant? How many more positions are partially variant
> > and to what degree? I have repeatedly talked about this so I am quite
> > surprised that you don't seem to remember reading such discussions.
> > Anyway, I will repeat using cytochrome c again as an example:
> >
> > Based on cytochrome c analysis of over 40 species, a minimum of around
> > 80 amino acids are required for this type of function to be realized
> > with around 100 or so amino acids used on average.
>
> What do you mean by "required" in this context? That they must be
> invariant? That they must be hydrophobic? That they must *not* be
> proline? Again, you repeatedly fail to define 'fairly specified' and
> how you determine that an amino acid is 'fairly specified'.

It seems to me that you fail to grasp the difference between the
minimum amino acid requirement and the minimum specificity
requirement. They are two different things. Just because a
particular function needs, say 100aa at minimum, does not mean that
all 100aa are highly specified in their order. Likewise, it seems as
though the cytochrome c function requires at least 80aa at minimum,
but this does not mean that each of these 80aa are highly specified or
"invariant". Do you understand the difference between these two types
of limitations now?

You know, it would really help if you read the entire train of thought
before you responded with long paragraphs with single words and
sentences. I often answer many of your questions and statements in
the very next sentence or paragraph, as I did this time.

> > Of these, 30 amino
> > acid positions that are highly constrained - rarely involving more
> > than one type of amino acid.
>
> Rarely is not never. How many are invariant?

You are really hung up on this idea of absolute invariance. I would
dare say that very few if any positions are absolutely invariant -
taken one at a time. The same is true of a 100-character sentence.
This, however, does not mean that this type of function is not highly
constrained. Character positions do not have to be absolutely
invariant in order to be very highly constrained - right?

> It seems to me that you
> count an amino acid as 'fairly specified' if it exhibits *any*
> constraint and treat it as if it were invariant.

Come on now! Are you suggesting that a position limited to less than
3 different amino acids is not "fairly specified"? Give me a break .
. .

> Actually, I haven't
> seen you calculate anything. All I have seen you do is say that there
> are positions that are very highly constrained, positions that are
> somewhat less constrained, and positions that are even less constrained.
> Then you wave your hands and come up with a number.

The math is fairly easy here. Given the constraints listed, you can
calculate the number yourself. You will find that the 10e60 number is
actually being quite generous given these listed constraints
(referenced by Yockey and others). If you think otherwise, do your
own calculation and tell me your results.

> Again, it is well known that all *modern* cytochrome c's have a high
> percentage of evolutionarily constrained sequences.

You mean *functionally* constrained sequences.

> That is because
> cytochrome c is small and most of its amino acids are in contact with
> the substrate. The high degree of evolutionary constraint is what makes
> this sequence particularly useful for analyzing deep phylogeny.

Functional constraints really aren't useful for studying actual
evolutionary relationships. The differences are different because of
different needs for different levels of a particular type of function
because of different environmental and phenotypic demands on different
organisms. Such functional differences in very different organisms
may have always been there by design. Again, the only way to rule out
design as the only logical explanation for such differences is to show
that mindless evolutionary processes can also explain the differences
beyond the lowest levels of informational complexity.

> The
> same is true for histones, for much the same reason. But you typically,
> and misleadingly, apply these percentages of evolutionary constraint to
> large molecules that show a much, much lower amount of evolutionary
> constraint.

I do not apply these percentages to much larger single molecules. I
have repeatedly said that larger single proteins often have far lower
constraints than do smaller protein functions. Cytochrome c and
histone proteins are very highly constrained - much more so than the
larger lactase enzyme and other such larger enzymes and proteins. I
use the smaller proteins as examples because their level of constraint
is well-known and clearly documented. I use them as illustrations to
show that increased constraint results in an equivalent reduction of
density in sequence space. Larger proteins, though less constrained,
may still be quite rare due to their minimum sequence size
requirement, which is a different type of constraint. Though less
constrained than a cytochrome protein, a lactase function requires
over 4 times as many amino acids at minimum. Still, the clincher
comes when you start considering multiprotein systems were each of the
smaller individual proteins have a fairly high level of amino acid
specificity/constraint. Since all of these proteins are required to
work together at the same time, their combined number of fairly highly
constrained amino acids starts to really add up - into the multiple
thousands of fairly specified amino acids working together at the same
time. For the flagellar system I would say that this number is well
over 5,000aa.
'
Again, if you disagree with this number, prove me wrong. Tell me what
you think the minimum genetic real estate is to code for a flagellar
motility system.



> > An additional 36 positions only vary
> > between 2 or 3 different amino acids. Another 15 positions vary
> > between no more than 4 different amino acids. Only 4 positions out of
> > 105 positions vary by more than 7 different amino acids. The two most
> > variable positions (60 and 89) vary by only 9 out of 20 possible amino
> > acids.
>
> Yes. I have no problem with the idea that cytochrome c has a higher
> degree of evolutionary constraint than other proteins. There is
> definitely evidence for that. But remember that you are only looking at
> 40 sequences (and are probably overweighted in sequences in metazoan
> animals that diverged recently). Positions that vary by more than 7
> amino acids are *not* highly constrained, when you are only looking at
> 40 sequences, given that, even assuming a random sampling, you wouldn't
> expect to see all 20 amino acids at any single position when you are
> only examining 40 sequences.

Ok, you give me your best numbers. But remember, a limitation to 19
out of 20 is still a constraint. Then, if over 60% of the positions
of a protein are limited to within 3 amino acids with another 15-20%
limited to within 4 amino acids (~80% total), I would call that highly
constrained, and I dare say you would too if you weren't in a debate
with me.

> More importantly, however, there is no way you can use these numbers
> from any *modern* protein or system to say *anything* about how
> difficult or easy it was to evolve that system. The numbers are
> completely irrelevant.

They are not completely irrelevant. They are the best way that we
have at understanding the density of such types of functional
sequences in sequence space. And, if they were completely irrelevant
you wouldn't work so hard at arguing against them and what they
obviously mean.

> The *only* model of evolution that you can use
> these numbers to test is the model your math tells us you are testing.
> Specifically, these numbers can tell us only the odds of the modern
> system evolving from the starting point of a random sequence by a
> process of a complete random walk with no possible intermediate states
> of utility. Since no one proposes that any *modern* protein or system,
> with rare exception like the nylonase case, evolves this way, your
> numbers are irrelevant *even if* they are correct.

Ok, how else do new types of functions evolve? Explain it to me
again. It seems to me that to get a new type of function you must
evolve new sequences that actually have new types of beneficial
functions. Higher levels of functional complexity had to involve the
previous evolution of steppingstone functions, each of which was
selectably advantageous. If these steppingstones are not there, then
evolution is impossible. If they are there, then evolution is not
only possible, but easy. Do you know another way?

> > It is quite obvious that the cytochrome c type of function is rather
> > constrained by both its minimum amino acid requirement
>
> You *still* haven't told me how you calculated "minimum amino acid
> requirement" besides say that some positions in cytochrome c are
> strongly conserved, others less conserved, and others still less. Then
> we get a wave of your hand and a number.

Again, the minimum amino acid requirement is different from the
minimum level of constraint. The minimum amino acid requirement is
not a calculated number, but is estimated based on the shortest
sequence found in a living thing having this type of function.

> > as well as the
> > fairly high degree of specificity required by the sequencing of these
> > amino acids. Well over 60% of this protein is restrained to within 3
> > amino acid options out of the 20 that are possible.
>
> The average time for a 100% probability of replacement of a single
> selectively neutral amino acid is 100 million years, Sean.

Not at all. Taking a mutation rate of 1e-6 mutations/generation, a
population of just 10 billion would realize such a "neutral" mutation
in just one generation in many of its individuals. Of course, many of
the variations in such proteins as cytochrome c are not neutral, but
are functionally beneficial and maintained as such by natural
selection (see below).

> If two of
> the sequences you examine were eutherian mammals (which separated some
> 60 million years ago), that would mean that 40% of their sequence would
> be identical *EVEN IF* every amino acid in their cytochrome c were
> selectively neutral and free to drift.
>
> And it would take around 300
> million years for all the possible amino acids due to neutral selection
> to be reached from an ancestral sequence *by chance alone* (selection,
> of course, works much, much faster -- indeed, in certain environments
> one generation is enough time) because you would need mutation at more
> than one site in a codon. Have you taken this into account at all? I
> sure can't tell, because all you do is make a series of statements and
> then wave your hand to come up with a number. Not that it really
> matters, since even if the number were correct, it would be irrelevant.

Consider that the average mutation rate for a given gene in all
creatures, is about 1 x 1e-6 mutations per gene per generation. That
means that a given gene will mutate only one time in one million
generations on average. Consider that single celled organisms have a
much shorter generation time than multi-celled organisms on average.
For example, the bacteria E. coli have a minimum generation time of 20
minutes compared to the generation time of humans of around 20 years.
With a gene being mutated every 1 to 10 million generations in E.
coli, one might think this would be a long time. However, each and
every gene in an E. coli lineage will get mutated once every 40 to 80
years. So, in one million years, each gene will have suffered at
least 10,000 mutations. Also consider that the population of single
celled organisms on earth is a lot higher than the populations of
multicelled organisms. For example, there are almost 6 billion people
living on earth today but more than 100 billion E. coli living inside
just one person's intestines.

Now, cytochrome c phylogenies are generally based on analysis of
certain subunits of cytochrome c which range in number of amino acids
up to a maximum of about 600 or so. This would translate into a
minimum of at least 1,800 nucleic acids in DNA coding for this subunit
of cytochrome c protein. Note that the tetrahymena species are about
50% different from all other creatures. It seems then that all the
creatures would have experienced at least a 25% change in their
genetic codes from the time of common ancestor. So how many
generations would it take to achieve this 25% difference?

Taking 25% of 1,800 give us 450 mutations. Lets say that the average
mutation rate is one mutation per 1,800 nucleic acids per one million
generations. For a steady state population of just one individual in
each generation it would take about 450 million generations to get a
25% difference from the common ancestor. With a generation time of 20
minutes (ie: E. coli), that works out to be about 342,000 years.
However, with a steady state population of say a trillion trillion
individuals (the total number of bacteria on earth is somewhere around
five million trillion trillion or 5 with 30 zeros following), one
might expect that the number of generations required to get a 25%
difference would be a bit less. So, for bacteria, the 25% difference
from the common ancestor cytochrome c, might have been achieved
relatively rapidly given the evolutionary time frame (a couple hundred
thousand years or so).

The question is then, if bacteria can achieve such relatively rapid
neutral genetic drift, why are they not more wide ranging in their
cytochrome c sequences? It seems that if these cytochrome c sequence
differences were really neutral differences, that various bacterial
groups, colonies, and species, would cover the entire range of
possible cytochrome c sequences to include that of mammals. Why are
they then so uniformly separated from all other "higher" species
unless the cytochrome sequences are functionally based and therefore
statically different due to the various needs of creatures that
inhabit different environments?

For example, bacteria are thought to share a common ancestor with
creatures as diverse as snails, sponges, and fish dating all the way
back to the Cambrian period some 600 million years ago. All of these
creatures are thought to have been around quite a long time - ever
since the "Cambrian Explosion." In fact, they have all been around
long enough and are diverse enough to exhibit quite a range in
cytochrome c variation. Why then are their cytochrome c sequences so
clustered? Why don't bacteria, snails, fish, and sponges cover the
range of cytochrome c sequence variation if these variation
possibilities are in fact neutral? In other words, why are there not
at least some types of bacteria that share sequence identity with
humans?

I propose that the clustered differences that are seen in genes and
protein sequences, such cytochrome c, are the result of differences in
actual function that actually benefit the various organisms according
to their individual needs. If the differences were in fact neutral
differences, there would be a vast overlap by now with complete
blurring of species' cytochrome c boundaries - even between species as
obviously different as humans and bacteria. Because of this, sequence
differences may not be so much the result of differences due to random
mutation over time as they are due to differences in the functional
needs of different creatures. I think that the same can be said of
most if not all phylogenies that are based on genotypic differences
between creatures.

In 1993, Patterson, Williams, and Humphries, scientists with the
British Museum, reached the following conclusion in their review of
the congruence between molecular and morphologic phylogenies:

"As morphologists with high hopes of molecular systematics, we end
this survey with our hopes dampened. Congruence between molecular
phylogenies is as elusive as it is in morphology and as it is between
molecules and morphology. . . . Partly because of morphology's long
history, congruence between morphological phylogenies is the exception
rather than the rule. With molecular phylogenies, all generated
within the last couple of decades, the situation is little better.
Many cases of incongruence between molecular phylogenies are
documented above; and when a consensus of all trees within 1% of the
shortest in a parsimony analysis is published structure or resolution
tends to evaporate."

http://naturalselection.0catch.com/Files/geneticphylogeny.html


> > That is a
> > significant constraint wouldn't you say? In fact the most generous
> > estimates of the total number of possible cytochrome c sequences in
> > sequence space that I have come across, based on these constraints,
> > suggest no more than 10e60 cytochrome c sequences exist. If you know
> > something different than I have suggested here, please do show me
> > otherwise.
>
> That certainly is a large number of potential different cytochrome c
> sequences (that is, sequences with quite significant cytochrome c
> activity). And the question is, of what possible relevance is that
> knowledge to what you are saying wrt the impossibility of evolution?
> Unless, of course, your logic is that cytochrome c sequences arise by a
> completely random process from a random sequence?

This number has to do with the density of sequences with this type of
function in sequence space. Knowing the density of beneficial
functions in the only way to understand the potential and limits of
evolutionary processes since evolution works via a crossing of
functionally beneficial steppingstones to new types of functions. If
the steppingstones aren't close enough, evolution doesn't happen.

> > Now you ask, "What value do I give such numbers?" What does this
> > estimate mean? Given that the total number of possible sequences in
> > sequence space requiring a minimum of 80 amino acids is well over
> > 10e100, the ratio of cytochrome c sequences to non-cytochrome c
> > sequences is less than 1 in 10e40 sequences.
>
> SFW? You keep coming up with this bogus ratio that presumes that
> cytochrome c, or whatever protein or system one is talking about, is
> derived by starting from a random sequence and generating the final
> result by a completely random walk with no possible states of
> intermediate utility. And then you turn around and deny that that is
> what you are saying.

Not at all. There could be intermediate stepping stone functions
between the original starting point and a cytochrome c function.
However, the odds that these steppingstones are close enough to a
cytochrome c sequence or any other beneficial sequence within that
level of complexity is determined through an understanding of
functional densities of sequences within sequence space. Although
highly specified, the cytochrome c function does not require all that
many amino acids at minimum. It's function is certainly within
striking distance of what I suspect most "original" genomes would have
to begin with. However, going very far above this level of specified
complexity becomes a really big problem really fast.

> > This translates into an
> > average gap of 30 amino acid changes between various islands of
> > cytochrome c sequence clusters within sequence space.
>
> And how is this "average gap" calculated from those two numbers? Be
> precise.

This isn't higher math here. If the sequence space gap that must be
crossed is 10e40 then that works out to be a bit over 20^30, or
slightly over 30 specified amino acid positional changes on average.

> And is that or is that not the "average gap" between a
> *random* protein or a *random* sequence and some sequence that has
> cytochrome c activity, just as all your other calculations are based on
> putative gaps between *random* proteins and *random* sequences and some
> teleologically determined end point with the proviso that only a random
> walk with no functional intermediates is possbile between the two states.
>
> Of course, as my little exercise with ebg shows, evolution does not work
> by starting with a *random* protein or a *random* sequence. And all
> that matters is the number of mutational steps required to generate a
> selectable activity even if that activity is not the end activity in
> question. IOW, your numbers are totally irrelevant. Not *even* wrong.
> But as useful as knowing the distance between earth and the furthest
> galaxy is to determining how one gets from L.A. to San Francisco.
> Utterly irrelevant.

It is not utterly irrelevant. Don't you understand the concept yet?
These estimates help determine the odds that L.A. will actually be as
close to San Francisco as it is. If the average distance from one
beneficial function to another is the size of the universe, that tells
you not to put too much money on the bet that starting with L.A., that
some new beneficial place like San Francisco, will be just a few
hundred miles away. That is what Las Vegas is all about - predicting
the average amount of time it takes someone to win. If the average
amount of time it takes for evolution to "win" a new type of function
at a particular level of complexity is a trillion years, what does it
matter if it happens to win tomorrow? In the end, the average is
still trillions of years. Just like gambling in Las Vegas, if you
keep playing the evolution game too long, even if you have an early
"win", you will eventually loose.

> > In other words,
> > if a particular organism that did not have a cytochrome c function but
> > would benefit from this type of function, its current proteins would
> > differ from the nearest potential cytochrome c sequence by an average
> > of over 30 amino acid positions.
>
> Evolution would not *start* with the average protein. It would start
> with the extreme end of the bell-shaped distribution of proteins that is
> only 1 or 2 amino acids away from a selectable function.

And what are the odds of that? At higher and higher levels of
complexity, the odds that a new type of beneficial function will only
be "just 1 or 2 steps away" becomes more and more remote in an
exponential fashion. The average distance is an extremely important
concept. It means that just because it happened once does not mean
that it will happen again, if ever. The averages get so enormously
small in short order that betting that evolution will succeed over
time becomes an exercise in insanity.

That's all for now . . .

Sean
www.naturalselection.0catch.com

Von Smith

unread,
Jan 3, 2004, 2:37:22 PM1/3/04
to
seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.04010...@posting.google.com>...

I did an analysis of 43 Presidents of the United States and have
concluded that the minimum size needed to achieve a Presidency
function is 5'4" (the height of James Madison). No shorter person can
be President. Now, you can easily prove me wrong here by finding a
functional President who was less than 5'4". Do you know of such a
President who actually served at any time in the United States?

This is exactly the argument you are using here. Do you see the
problem with it? Not only is your logic flawed, but your conclusion
appears to be at odds with what is known about protein chemistry (if
posters such as Howard Hershey, Ian Musgrave, "sweetness", and
"Deaddog" can be trusted as knowledgeable in the field).

Until you substantiate your claims about minimum requirements with
something substantial, there is no need to rub your nose in a
counter-example. It is sufficient to note that your assertion is
neither supported nor consistent with what we know about how proteins
work. I have seen several posters explain to you how they actually
*do* work. I have not seen you draw upon, or for that matter even
demonstrate, any such knowledge.

One question I have is: even if there is some minimum length, how is
this length relevant to discussing the evolvability of a lactase
function from a gene coding for a protein more than twice that length?
Presumably you want to talk about the evolvability of a function
based on its putative density in some "sequence space", and I would
guess that you are trying to argue that the relevant sequence space is
that of the minimum length. But that makes no sense: the ebg protein
isn't searching the 400aa sequence space; it can't. So why, exactly,
does this "minimum length", even if it exist, interest you here?

Von Smith
Fortuna nimis dat multis, satis nulli.

Von Smith

unread,
Jan 3, 2004, 3:01:04 PM1/3/04
to
> Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF4957C...@indiana.edu>...
>

<snip>

>
> > > In other words,
> > > if a particular organism that did not have a cytochrome c function but
> > > would benefit from this type of function, its current proteins would
> > > differ from the nearest potential cytochrome c sequence by an average
> > > of over 30 amino acid positions.
> >
> > Evolution would not *start* with the average protein. It would start
> > with the extreme end of the bell-shaped distribution of proteins that is
> > only 1 or 2 amino acids away from a selectable function.
>
> And what are the odds of that?

Yes, exactly, what *are* the odds? Switching back momentarily to the
example of lactase activity in E. coli, there are at least two other
proteins out of the few thousands it produces that are *known* to be 1
or 2 amino acids away from a selectable lactase function. Is this
frequency consistent with the odds that you would have calculated
based on your methods? Apparently not, which suggests that there is
something wrong with your calculation.

One possibility is that both your underlying model and your
calculation are basically correct, but that there is some other
significant factor at work (such as intervention by an intelligent
agent) which affects observed frequency of evolving lactase functions.
For obvious subjective reasons, this is the possibility you prefer
(AIU your "spare tire" argument about the ebg gene).

Another is that your underlying model is right but your calculation is
wrong, and that the actual probability (or "density in sequence
space") is in fact on the order of 1 in a few thousand. This is the
explanation you have incorrectly attributed to me, although I do
suspect that the ratio of functional sequences is somewhat higher than
you want to acknowledge.

Yet another possibility is that your underlying model is wrong, in
which case your calculations, accurate or not, are irrelevant. This
seems to me the most likely explanation, the one most consistent with
what other posters have told you about how proteins work; it is a
possibility you do not seem to ever acknowledge or really address.


> At higher and higher levels of
> complexity, the odds that a new type of beneficial function will only
> be "just 1 or 2 steps away" becomes more and more remote in an
> exponential fashion. The average distance is an extremely important
> concept. It means that just because it happened once does not mean
> that it will happen again, if ever. The averages get so enormously
> small in short order that betting that evolution will succeed over
> time becomes an exercise in insanity.

Observation seems to be at odds with all your claims and
"calculations", and require explanation. Do you have one?

RobinGoodfellow

unread,
Jan 3, 2004, 9:06:46 PM1/3/04
to
seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.03123...@posting.google.com>...

> RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...

[snip to the point where I left off]

> > Again, I totally agree that a simple Monte-Carlo
> > process is a pitiful model for evolution - but Sean's statistics are way
> > off even in when applied to this model.
>
> I fail to see how you have supported this statement of yours. My
> statistics do seem to match not only the exponentially increasing
> ratios found in language systems like English

Oh, do they now? I think it's time we laid this little baseless
assertion of yours to rest. Of course, to do that, we need to
accurately model how languages actually "evolve", rather than apply
your strawman "one letter at a time" model to the concept. We know
that new words rarely arise in a language spontaneously. This is
especially true for longer words, that have a lower "density" in what
you would call "word space". Rather, such words are often borrowed
from other languages, or made by combining standard prefixes,
suffixes, and roots. Furthermore, many English words derive from
archaic versions of themselves, which are no longer used (i.e.
"beneficial"), but certainly were so in the past. Finally, the
function of words is to convey meaning - therefore, a set of words
that clearly conveys a meaning is still functional even if some of the
words are misspelt.

So, if you wish to model language via an evolutionary process (that
bears at least some semblance to actual biological evolution) you must
support the following mutational events: 1) one-letter changes (point
mutations), 2) insertion and deletion (genomic frame shifts), 3)
recombinations of shorter words, and standard prefixes, suffixes,
e.t.c. (domain shuffles), 4) incorporation of words from other
languages (horizontal transfer), 5) word or word part duplication
(gene duplication - much more useful for biological systems than for
English language, since the latter deliberately tends to avoid
repetition). You must allow for misspellings, since they rarely alter
the meaning (function) of longer words and especially sentences, and
thus strings containing mispellings can remain perfectly functional.
Once you allow for this type of model, and give me a dictionary of
short starting functional words, I can easily generate a huge
repository of words and arbitrarily long meaningful sentences using
the above evolutionary operations, with a bare minimum of neutral
drift along the way. (e.g. I can go from "philosophy" to "teleology"
in just two simple steps, producing a new beneficial word as the
intermediary.) Now, I am not saying I can generate *any* word or
sentence that way - but I can meet any reasonable (for the English
language) complexity metric that you propose.

Note that the model I described above is a much better approximation
of biological evolution than the point mutations model that you insist
upon. And it would work on the English language quite well, despite
the fact that our language is not nearly as versatile as actual
biological systems. However, there is one essential feature of this
model that is true of biological systems as well, and that you
consistently fail to grasp, so let me spell it out: evolutionary
processes produce complex systems (i.e. words, proteins, sentences,
flagella) not by simple alterations to unrelated complex systems, but
by by incrementally combining simpler functional components, until a
solution that yields a novel beneficial function is stumbled upon.
Then, simpler adjustments (such as point mutations) can take place,
making the new system *specific* for performing the novel beneficial
function.

> and information systems
> like functional proteins and genes, but they also match statistical
> programs used in computer software development and the like. This is
> why computers cannot evolve their own software programs beyond the
> lowest levels of functional complexity. To go very far beyond the
> informational complexity that they already have they require the
> intelligence and creativity of human programmers to get across these
> vast neutral gaps that simply cannot be searched out in any sort of
> reasonable amount of time by mindless processes.

It is rather amusing how you keep misinterpreting Lenski's research.
The whole point they have demonstrated is that evolution can produce
systems of high complexity as long as there is, to use your own words,
a "ladder of complexity" to climb. That is, the point of Lenski's
demonstration was that as long as there are beneficial functions of
intermediate complexity, evolutionary processes can combine the
components performing these functions to yield qualitatively
different, novel functions of higher and higher complexity. However,
if no such functions exist (there is no ladder of complexity - all
systems below a certain level of complexity are unable to perform a
beneficial function necessary for an organism), then evolutionary
processes cannot simply poof complex systems out of thin air.
Evolution can only operate on existing functional components,
incrementally re-combining and modifying them to occasionally produce
something new: it doesn't just generate brand-new systems from
scratch. It is therefore an easily falsifiable prediction of
evolution that living organisms are not going to contain many large
functional molecules or molecular systems that look like they've been
generated completely from scratch. Care to falsify this prediction?

As for the need for intelligent input in the Lenski experiment - the
reason they needed to specify the functions of intermediate utility is
because they have been trying to attain a specific teleological goal,
and needed certain steps to reach this goal. If rather than achieving
a certain goal, they had set up their experiment so as to attain a
function of a certain level of complexity, they would not need to
specify functions at intermediate levels of complexity explicitly -
just to ensure that a sufficiently wide spectrum of such functions
exists (as it certainly does in biology). But this is a much harder
computational problem that presents interesting venues for further
ALIFE research. Which begs the question as to why no creationists or
IDers seem to be taking part in it? (to my knowledge - please correct
me if I am wrong.)

> So, please do show me
> how your Monte-Carlo technique can search an increasing state space
> and find beneficial states in a linear fashion with each increase in
> the minimum informational complexity requirement.

I never said that it could. It is you who believes that evolution
works in such fashion, not I. What I said is that your calculations
do NOT, in any way, demonstrate that it can't. They are completely
irrelevant for the problem you are trying to analyze, but you simply
do not realize it. Even if your assertions that evolution cannot
produce systems at a certain level of complexity are correct (and that
such levels of complexity are indeed regularly seen in biological
organisms), you do not have the statistics to back it up. Don't feel
too bad, though - no one does. The problem is far too complicated to
analyze statistically at the moment: it will require far greater
levels of understanding of biological systems then we currently
possess, along with far better computational tools than are currently
available. Strides are being taken in this direction, however, and
they do not bode too well for your position. I'll get to that below.

> > His probabilities would only be
> > valid if evolution worked by repeatedly generating N-amino-acid
> > sequences de novo *every time*, with selection only keeping "beneficial"
> > sequences. That is, Sean's calculation reflects the probability of
> > finding a desired state with property X by blind, uniform random
> > sampling of N-dimensional space. The best thing that could be said
> > about such an attempt to model evolution is that it is laughable.
>
> How is this laughable when you evolutionists can't seem to come up
> with any other way to explain now new types of functions that require
> at least a couple thousand fairly specified amino acids working
> together at the same time can evolve?

First of all, you haven't even come close to demonstrating that there
are functions that *require* at least a couple of thousand of *fairly
specified* working together to evolve. When presented with examples
of very large proteins evolving, you quickly (and, as far as I can
tell, correctly) pointed out that most of the amino acids are at
functionally neutral positions, and thus are allowed to vary. But
you've done nothing to demonstrate that some 2000 amino acids in the
proteins composing, say, the flagellum need to be "fairly specified".
You've tried to do something like this for cytochrome C, but even
there your analysis badly fails. You claim that a majority of
positions in Cytochrome C are constrained to only a few amino acids,
making cytochrome C relatively specific for electron transport.
However, that does not mean that the precursor protein performing the
rudimentary electron transport function required highly specific amino
acids. Rather, once rudimentary electron transport was developed in
the common ancestor, subsequent mutations produced variants that could
perform the function more efficiently. Such mutations would not be
reversible, since a reverse mutation would yield an electron
transporter of lesser efficiency, and would be selected out. As the
result, in all modern organisms, the electron transport function is
relatively constrained - though it did not need to be the case for
ancestral organisms. Note that such functions are by far the
exception rather than the rule, and there is a perfectly workable
model for explaining how they came about.

Secondly, note to what I referred to as "laughable". The preceding
sentence was
(I paraphrase) "Sean's calculation reflects the probability of finding
a desired beneficial state by blind, uniform random sampling of
N-dimensional space." That is, the search technique for which your
calculations apply is "generate a completely random N-amino acid
sequence, see if it works". But there is not even a biological
mechanism which could be used to generate long polypeptide chain
completely de novo, and do so repeatedly and at random. In other
words, you are proposing (probably without realizing it) to model
evolution using non-existant biological mechanisms. That is what I
find laughable.

> What method do you propose to
> explain such levels of functional diversity within living things? How
> do you get from one type of function at such a level to another type
> of function within this same level of specified complexity?

You generally don't. That's the key point. Complex functional
systems do not result from modification of equally complex, unrelated
functional systems. They are built up from the combination and
modification of simpler components, that already exhibit some
beneficial function for the organism. Specificity comes about even
later, as the new functions themselves start to play an important role
in the organism, and selection takes over with respect to mutations
affecting these functions.

> > But
> > it appears that Sean does not realize that he is making this mistake,
> > and keeps repeating his probabilities like a broken record.
>
> And you guys keep repeating your non-supported assertions like a
> mantra. You keep saying I'm crazy and that my ideas are laughable,
> but you have presented nothing to significantly counter my position.

I am pointing out that you are making elementary statistical errors,
and are not even realizing that you are doing so. Your ideas may be
100% percent correct for all I care, but, from a statistical point of
view, you've done nothing to back them up.

> My hypothesis remains untouched and my predictions still hold.

Actually, many posters (Ian Musgrave, sweetness, Deaddog, Howard
Hershey) have presented you with a variety of examples of evolution
that could conceivably refute your hypothesis. You have dismissed
them, identifying potential weaknesses in their arguments. However,
you've refused to apply the same criteria (e.g. count only the
conserved, "specified" amino acids, rather than the total number of
amino acids) in your so-called unevolvable systems. You've done
nothing to revise or adjust your argument in light of the evidence
you've been shown. That does not a credible position make.

> What have you presented besides a bunch of non-supported "just-so" and
> "trust me" statements? Where is your falsifiable evidence?

All around you. I gave you one above - very few large proteins are
going to look like they've been produced "de novo" (i.e. will have no
sequence or structural homologs), instead of the result of domain
shuffling, duplication and divergence, e.t.c. Care to falsify it?

As for your claims, what sort of falsifiable evidence would you accept
(since you've rejected everything presented to you so far)? Real-time
evolution of a flagellum-like motility system? Given the time scales
it is thought to take in real life, that does not seem possible. Even
if it were, we would first need to learn its exact evolutionary
history, the environmental condition under which it arose, e.t.c. In
an experiment, we would need to replicate these things precisely - and
then, what would stop you or some other IDer from claiming that it was
the intelligent input involved in the experimental setup that gave
rise to the flagellum, rather than so-called mindless processes (just
you do in the case of Lenski's research)?

> The best
> that I can see is that you guys keep falling back on the philosophical
> position that given enough time anything is possible via the
> extraordinary creativity of The Mindless - even beyond the most
> miraculous creations of mankind.

The position that given enough time anything that has a non-zero
probability of occurring will occur is not philosophical: it is
mathematically demonstrable, as you would know if you had a solid
background in probability and statitics. But that is not the
question. The question is: was there enough time, and enough raw
material to work with, for evolution to produce what we see in nature
today. If evolution worked the way you think it works, the answer
would be a most resounding "no". However, given the way evolution
actually does operate, what we have seen it (and similar processes) do
on short time scales, and what we've extrapolated to geological time
frames, it appears very likely that the answer is "yes".

> Basically evolution explains everything - even without demonstration -
> and therefore nothing.

Oh, really now? Does evolution explain why apples fall down to the
ground, and not fly straight up? Does evolution explain why covalent
bonds are stronger than ionic bonds? ID explains all those things
trivially: but evolution only explains a range of observations in
biology, and does so with as much detail as possible given the current
level of knowledge. Your claim that it does so without demonstration
is outrageous, as many aspects have indeed been demonstrated. We have
seen the evolution of novel enzymes, of small multi-protein systems,
the rise of new species, the agreement of genetic evidence with major
evolutionary predictions, the application of evolutionary to solution
of complex problems in other fields. All in a short, short span of
100 years. Just because we can't demonstrate within a span of a few
years every little thing than an evolution denier demands (especially
when such demands are completely unreasonable), doesn't mean that the
theory is broken. Apply the same standard to, say, General
Relativity, and you can toss that theory out in two seconds flat.

> It is a weak historical hypothesis at best.
> It is not falsifiable by any sort of real time genetic experiment -
> such as a Pasteur-like experiment. Every time a prediction fails, you
> evolutionists just fall back on your philosophy and say, "Oh well, I
> guess that particular level of evolution just requires millions of
> years - but it certainly happened within 4 billion years that's for
> sure!" Really, there is no way to falsify such a philosophical
> position.

Baloney. Again, above I gave you prediction of Evolution that can
easily be falsified. On a more a basic level, the absence of the twin
nested hierarchy would be a fatal blow to evolution. The lack of
mechanism to introduce novelty into the genome, or to recombine
functional genetic elements, would again be a death knell. The
preponderance of chemaeric life-forms, especially among higher
organisms, would once again pose an insurmountable obstacle to the
theory as it currently stands. These are just a few ways to falsify
evolution off the top of my head. Now, why don't you give me just one
way to falsify ID?

> Statistically you have nothing.

No. Statistically, you have nothing. Statistically, we have quite a
bit, if you bothered to look. We have population genetics models that
verify that observed frequencies of individual genes in populations
are consistent with observed evolutionary rates of change. We have
rudimentary models of protein evolution, showing how the observed
range and diversity of proteins structures is consistent with
evolutionary processes (Deeds et. al., Biophys J, 85(5):2962-72). We
have studies showing how complex biochemical networks arise in the
genome from evolutionary processes (Von Mering et. al., PNAS,
100(26):15428-33). We have real-time computational demonstration of
how complex functions can arise as the result of evolutionary change
(like Lenski's work, that you insist on misinterpreting). That, along
with hundreds of detailed evolutionary analyses of a broad spectrum of
organismal lineages, on both short and very long time scales. What we
do not yet have is a unified, computationally verifiable model of
evolution over geological timescales. Probably, we won't have such a
model for a while, since there are considerable challenges yet to
overcome - but as I said, steps are being taken in this direction. It
might even turn out (hihgly unlikely though it is) that when we get
there, and plug the model in, it is you who will turn out to be right.
But if that is the case, we won't have you to thank for it - since
neither, nor any creationist or IDer to my knowledge, have done or are
doing anything to demonstrate it.

> Statistically it is very
> clear that evolution, as an explanation for the variety and levels of
> functional complexity that we find in all living things, is simply
> untenable.

Only if you do not understand the statistics involved. Which, I am
sorry to say, you do not - as I hope I've demonstrated above, and in
my previous post.

> Of course, you are free to hold whatever philosophical
> position that you want, but if you hope to convince those who actually
> wish to consider the statistical problems involved, you will have to
> do much better than you have done so far to hold onto your illusions
> of "scientific superiority".

All I can say is that those who wish to consider the statistical
problems involved might benefit from understanding the statistical
problems involved. If you really think all you need do to model
evolutionary processes is to raise 20 to the power N, you really,
really, really, have a lot to learn.

> Sean
> www.naturalselection.0catch.com

Cheers,
RobinGoodfellow.

Sean Pitman

unread,
Jan 3, 2004, 9:18:17 PM1/3/04
to
lmuc...@yahoo.com (RobinGoodfellow) wrote in message news:<81fa9bf3.04010...@posting.google.com>...

> seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.03123...@posting.google.com>...
>
> Good gravy! That was so wrong, it feels wrong to even use the word
> "wrong" to describe it. All I can recommend is that you run, don't
> walk, to your nearest college or university, and sign up as quickly as
> you can for a few math and/or statistics courses: I especially
> recommend courses in probability theory and stochastic modelling.
> With all due respect, Sean, I am beginning to see why the biologists
> and biochemists in this group are so frustrated with you: my
> background in those fields is fairly weak - enough to find your
> arguments unconvincing but not necessarily ridiculous - but if you are
> as weak with biochemistry as you are with statistical and
> computational problems, then I can see why knowledgeable people in
> those areas would cringe at your posts.

With all due respect, what is your area of professional training? I
mean, after reading your post I dare say that you are not only weak in
biology, but statistics as well. Certainly your numbers and
calculations are correct, but the logic behind your assumptions is
extraordinarily fanciful. You sure wouldn't get away with such
assumptions in any sort of peer reviewed medical journal or other
statistically based science journal - that's for sure. Of course, you
may have good success as a novelist . . .

> I'll try to address some of the mistakes you've made below, though I
> doubt that I can do much to dispel your misconceptions. Much of my
> reply will not even concern evolution in a real sense, since I wish to
> highlight and address the mathematical errors that you are making.

What you ended up doing is highlighting your misunderstanding of
probability as it applies to this situation as well as your amazing
faith in an extraordinary stacking of the deck which allows evolution
to work as you envision it working. Certainly, if evolution is true
then you must be correct in your views. However, if you are correct
in your views as stated then it would not be evolution via mindless
processes alone, but evolution via a brilliant intelligently designed
stacking of the deck.

> > RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...
>
> > > It is even worse than that. Even random walks starting at random points
> > > in N-dimensional space can, in theory, be used to sample the states
> > > with a desired property X (such as Sean's "beneficial sequences"), even
> > > if the number of such states is exponentially small compared to the
> > > total state space size.
> >
> > This depends upon just how exponentially small the number of
> > beneficial states is relative to the state space.
>
> No, it does not. If you take away anything from this discussion, it
> has to be this: the relative number of beneficial states has virtually
> no bearing on the amount of time a local search algorithm will need to
> find such a state.

LOL - You really don't have a clue how insane this statement is?

> The things that *would* matter are the
> distribution of beneficial states through the state space, the types
> of steps the local search is allowed to take (and the probabilities
> associated with each step), and the starting point.

This distribution of states has very little if anything to do with how
much time it takes to find one of them on average. The starting point
certainly is important to initial success, but it also has very little
if anything to do with the average time needed to find more and more
beneficial functions within that same level of complexity. For
example, if all the beneficial states were clustered together in one
or two areas, the average starting point, if anything, would be
farther way than if these states were distributed more evenly
throughout the sequence space. So, this leaves the only really
relevant factor - the types of steps and the number of steps per unit
of time. That is the only really important factor in searching out
the state space - on average.

> For an extreme
> example, consider a space of strings consisting of length 1000, where
> each position can be occupied by one of 10 possible characters.

Ok. This would give you a state space of 10 to the power of 1000 or
1e1000. That is an absolutely enormous number.

> Suppose there are only two beneficial strings: ABC........, and
> BBC........ (where the dots correspond to the same characters). The
> allowed transitions between states are point mutations, that are
> equally probable for each position and each character from the
> alphabet. Suppose, furthermore, that we start at the beneficial state
> ABC. Then, the probability of a transition from ABC... to BBC... in a
> single mutation 1/(10*1000) = 1/10000 (assuming self-loops - i.e.
> mutations that do not alter the string, are allowed).

You are good so far. But, you must ask yourself this question: What
are the odds that out of a sequence space of 1e1000 the only two
beneficial sequences with uniquely different functions will have a gap
between them of only 1 in 10,000? The time required to cross this
tiny gap would require a random walk of only 10,000 steps on average.
For a decent sized population, this could be done in just one
generation.

Don't you see the problem with this little scenario of yours?
Certainly this is a common mistake made by evolutionists, but it is
none-the less a fallacy of logic. What you have done is assume that
the density of beneficial states is unimportant to the problem of
evolution since it is possible to have the beneficial states clustered
around your starting point. But such a close proximity of beneficial
states is highly unlikely. On average, the beneficial states will be
more widely distributed throughout the sequence space.

For example, say that there are 10 beneficial sequences in this
sequence space of 1e1000. Now say one of these 10 beneficial
sequences just happens to be one change away from your starting point
and so the gap is only a random walk of 10,000 steps as you calculated
above. However, on average, how long will it take to find any one of
the other 9 beneficial states? That is the real question. You rest
your faith in evolution on this inane notion that all of these states
will be clustered around your starting point. If they were, that
certainly would be a fabulous stroke of luck - like it was *designed*
that way. But, in real life, outside of intelligent design, such
strokes of luck are so remote as to be impossible for all practical
purposes. On average we would expect that the other nine sequences
would be separated from each other and our starting point by around
1e999 random walk steps/mutations (i.e., on average it is reasonable
to expect there to be around 999 differences between each of the 10
beneficial sequences). So, even if a starting sequence did happen to
be so extraordinarily lucky to be just one positional change away from
one of the "winning" sequences, the odds are that this luck will not
hold up as well in the evolution of any of the other 9 "winning"
sequences this side of a practical eternity of time.

Real time experiments support this position rather nicely. For
example, a recent and very interesting paper was published by Lenski
et. al., entitled, "The Evolutionary Origin of Complex Features" in
the 2003 May issue of Nature. In this particular experiment the
researchers studied 50 different populations, or genomes, of 3,600
individuals. Each individual began with 50 lines of code and no
ability to perform "logic operations". Those that evolved the ability
to perform logic operations were rewarded, and the rewards were larger
for operations that were "more complex". After only15,873 generations,
23 of the genomes yielded descendants capable of carrying out the most
complex logic operation: taking two inputs and determining if they are
equivalent (the "EQU" function).

In principle, 16 mutations (recombinations) coupled with the three
instructions that were present in the original digital ancestor could
have combined to produce an organism that was able to perform the
complex equivalence operation. According to the researcher themselves,
"Given the ancestral genome of length 50 and 26 possible instructions
at each site, there are ~5.6 x 10e70 genotypes [sequence space]; and
even this number underestimates the genotypic space because length
evolves."

Of course this sequence space was overcome in smaller steps. The
researchers arbitrarily defined 6 other sequences as beneficial (NAND,
AND, OR, NOR, XOR, and NOT functions). The average gap between these
pre-defined steppingstone sequences was 2.5 steps, translating into an
average search space between beneficial sequences of only 3,400 random
walk steps. Of course, with a population of 3,600 individuals in a
population, a random walk of 3,400 will be covered in short order by
at least one member of that population. And, this is exactly what
happened. The average number of mutations required to cross the
16-step gap was only 103 mutations per population.

Now that is lightening fast evolution. Certainly if real life
evolution were actually based on this sort of setup then evolution of
novel functions at all levels of complexity would be a piece of cake.
Of course, this is where most descriptions of this most interesting
experiment stop. But, what the researchers did next is the most
important part of this experiment.

Interestingly enough, Lenski and the other scientists went on to set
up different environments to see which environments would support the
evolution of all the potentially beneficial functions - to include the
most complex EQU function. Consider the following description about
what happened when various intermediate steps were not arbitrarily
defined by the scientists as "beneficial".

"At the other extreme, 50 populations evolved in an environment where
only EQU was rewarded, and no simpler function yielded energy. We
expected that EQU would evolve much less often because selection would
not preserve the simpler functions that provide foundations to build
more complex features. Indeed, none of these populations evolved EQU,
a highly significant difference from the fraction that did so in the
reward-all environment (P = 4.3 x 10e-9, Fisher's exact test).
However, these populations tested more genotypes, on average, than did
those in the reward-all environment (2.15 x 10e7 versus 1.22 x 10e7;
P<0.0001, Mann-Witney test), because they tended to have smaller
genomes, faster generations, and thus turn over more quickly. However,
all populations explored only a tiny fraction of the total genotypic
space. Given the ancestral genome of length 50 and 26 possible
instructions at each site, there are ~5.6 x 10e70 genotypes; and even
this number underestimates the genotypic space because length
evolves."

Isn't that just fascinating? When the intermediate stepping stone
functions were removed, the neutral gap that was created successfully
blocked the evolution of the EQU function, which happened *not* to be
right next door to their starting point. Of course, this is only to
be expected based on statistical averages that go strongly against the
notion that very many possible starting points would just happen to be
very close to an EQU functional sequence in such a vast sequence
space.

Now, isn't this consistent with my predictions? This experiment was
successful because the intelligent designers were capable to defining
what sequences were "beneficial" for their evolving "organisms." If
enough sequences are defined as beneficial and they are placed in just
the right way, with the right number of spaces between them, then
certainly such a high ratio will result in rapid evolution - as we saw
here. However, when neutral non-defined gaps are present, they are a
real problem for evolution. In this case, a gap of just 16 neutral
mutations effectively blocked the evolution of the EQU function.

http://naturalselection.0catch.com/Files/computerevolution.html

> Thus, a random
> walk that restarts each time after the first step (or alternatively, a
> random walk performed by a large population of sequences, each
> starting at state ABC...) is expected to explore, on average, 10000
> states before finding the next beneficial sequence.

Yes, but you are failing to consider the likelihood that your "winning
sequence" will in fact be within these 10,000 steps on average.

> Now, below, we
> will apply your model to the same problem.

Oh, I can hardly wait!

> > It also depends
> > upon how fast this space is searched through. For example, if the
> > ratio of beneficial states to non-beneficial states is as high as say,
> > 1 in a 1e12, and if 1e9 states are searched each second, how long with
> > it take, on average, to find a new beneficial state?
>
> OK. Let's take my example, instead, and apply your calculations.
> There are only 2 beneficial sequences, out of the state space of
> 1e1000 sequences.

Ok, I'm glad that you at least realize the size of the state space.

> Since the ratio of beneficial sequences to
> non-beneficial ones is (2/10^1000), if your "statistics" are correct,
> then I should be exploring 10^1000/2 states, on average, before
> finding the next beneficial state. That is a huge, huge, huge number.
> So why does my very simple random walk explore only 10,000 states,
> when the ratio of beneficial sequences is so small?

Yes, that is the real question and the answer is very simple - You
either got unbelievably lucky in the positioning of your start point
or your "beneficial" sequences were clustered by intelligent design.

> The answer is simple - the ratio of beneficial states does NOT matter!

Yes it does. You are ignoring the highly unlikely nature of your
scenario. Tell me, how often do you suppose your start point would
just happen to be so close to the only other beneficial sequence in
such a huge sequence space? Hmmmm? I find it just extraordinary that
you would even suggest such a thing as "likely" with all sincerity of
belief. The ratio of beneficial to non-beneficial in your
hypothetical scenario is absolutely miniscule and yet you still have
this amazing faith that the starting point will most likely be close
to the only other "winning" sequence in an absolutely enormous
sequence space?! Your logic here is truly mysterious and your faith
is most impressive. I'm sorry, but I just can't get into that boat
with you. You are simply beyond me.

> All that matters is their distribution, and how well a particular
> random walk is suited to explore this distribution.

Again, you must consider the odds that your "distribution" will be so
fortuitous as you seem to believe it will be. In fact, it has to be
this fortuitous in order to work. It basically has to be a set up for
success. The deck must be stacked in an extraordinary way in your
favor in order for your position to be tenable. If such a stacked
deck happened at your table in Las Vegas you would be asked to leave
the casino in short order or be arrested for "cheating" by intelligent
design since such deck stacking only happens via intelligent design.
Mindless processes cannot stack the deck like this. It is
statistically impossible - for all practical purposes.

> (Again, it is a
> gross, meaningless over-simplification to model evolution as a random
> walk over a frozen N-dimensional sequence space, but my point is that
> your calculations are wrong even for that relatively simple model.)

Come now Robin - who is trying to stack the deck artificially in their
own favor here? My calculations are not based on the assumption of a
stacked deck like your calculations are, but upon a more likely
distribution of beneficial sequences in sequence space. The fact of
the matter is that sequence space does indeed contain vastly more
absolutely non-beneficial sequences than it does those that are even
remotely beneficial. In fact, there is an entire theory called the
"Neutral Theory of Evolution". Of all mutations that occur in every
generation in say, humans (around 200 to 300 per generation), the
large majority of them are completely "neutral" and those few that are
functional are almost always detrimental. This ratio of beneficial to
non-beneficial is truly small and gets exponentially smaller with each
step up the ladder of specified functional complexity. Truly,
evolution gets into very deep weeds very quickly beyond the lowest
levels of functional/informational complexity.

> > It will take
> > just over 1,000 seconds - a bit less than 20 minutes on average. But,
> > what happens if at higher levels of functional complexity the density
> > of beneficial functions decreases exponentially with each step up the
> > ladder? The rate of search stays the same, but the junk sequences
> > increase exponentially and so the time required to find the rarer and
> > rarer beneficial states also increases exponentially.
>
> The above is only true if you use the following search algorithm:
>
> 1. Generate a completely random N-character sequence
> 2. If the sequence is beneficial, say "OK";
> Otherwise, go to step 1.

Actually the above is also true if you start with a likely starting
point. A likely starting point will be an average distance away from
the next closest beneficial sequence. A random mutation to a sequence
that does not find the new beneficial sequence will not be selectable
as advantageous and a random walk will begin.

> For an alphabet of size S, where only k characters are "beneficial"
> for each position, the above search algorithm will indeed need to explore
> exponentially many states in N (on average, (S/k)^N), before finding a
> beneficial state. But, this analysis applies only to the above search
> algorithm - an exteremely naive approach that resembles nothing that
> is going on in nature.

Oh really? How do you propose that nature gets around this problem?
How does nature stack the deck so that its starting point is so close
to all the beneficial sequences that otherwise have such a low density
in sequence space?

> The above algorithm isn't even a random walk
> per se, since random walks make local modifications to the current
> state, rather than generate entire states anew.

The random walk I am talking about does indeed make local
modifications to a current sequence. However, if you want to get from
the type of function produced by one state to a new type of function
produced by a different state/sequence, you will need to eventually
leave your first state and move onto the next across whatever neutral
gap there might be in the way. If a new function requires a sequence
that does not happen to be as fortuitously close to your starting
sequence as you like to imagine, then you might be in just a bit of a
pickle. Please though, do explain to me how it is so easy to get from
your current state, one random walk step at a time, to a new state
with a new type of function when the density of beneficial sequences
of the new type of function are extraordinarily infinitesimal?

> A random walk
> starting at a given beneficial sequence, and allowing certain
> transitions from one sequence to another, would require a completely
> different type of analysis. In the analyses of most such search
> algorithms, the "ratio" of beneficial sequences would be irrelevant -
> it is their *distribution* that would determine how well such an
> algorithm would perform.

The most likely distribution of beneficial sequences is determined by
their density/ratio. You cannot simply assume that the deck will be
so fantastically stacked in the favor of your neat little evolutionary
scenario. I mean really, if the deck was stacked like this with lots
of beneficial sequences neatly clustered around your starting point,
evolution would happen very quickly. Of course, there have been those
who propose the "Baby Bear Hypothesis". That is, the clustering is
"just right" so that the theory of evolution works. That is the best
you can hope for. Against all odds the deck was stacked just right so
that we can still believe in evolution. Well, if this were the case
then it would still be evolution by design. Mindless processes just
can't stack the deck like you are proposing.

> My example above demonstrates a problem
> where the ratio of beneficial states is exteremely tiny, yet the
> search finds a new beneficial state relatively quickly.

Yes - because you stacked the deck in your favor via deliberate
design. You did not even try to explain the likelihood of this
scenario in real life. How do you propose that this is even a remote
reflection of what mindless processes are capable of? I'm talking
average probabilities here while you are talking about extraordinarily
unlikely scenarios that are basically impossible outside of deliberate
design.

> I could also
> very easily construct an example where the ratio is nearly one, yet a
> random walk starting at a given beneficial sequence would stall with a
> very high probability.

Oh really? You can construct a scenario where all sequences are
beneficial and yet evolution cannot evolve a new one? Come on now . .
. now you're just being silly. But I certainly would like to see you
try and set up such a scenario. I think it would be most
entertaining.

> In other words, Sean, your calculations are
> irrelevant for the kind of problem you are trying to analyze.

Only if you want to bury your head in the sand and force yourself to
believe in the fairytale scenarios that you are trying to float.

> If you
> wish to model evolution as a random walk of point mutations on a
> frozen N-dimensional sequence space, you will need to apply a totally
> different statististical analysis: one that takes into account the
> distributions of known "beneficial" sequences in sequence space. And
> then I'll tell you why that model too is so wrong as to be totally
> irrelevant.

And if you wish to model evolution as a walk between tight clusters of
beneficial sequences in an otherwise extraordinarily low density
sequence space, then I have some oceanfront property in Arizona to
sell you at a great price.

Until then, this is all I have time for today.

> Cheers,
> RobinGoodfellow.

Sean
www.naturalselection.0catch.com

RobinGoodfellow

unread,
Jan 4, 2004, 3:15:15 AM1/4/04
to
Sean Pitman wrote:

> lmuc...@yahoo.com (RobinGoodfellow) wrote in message news:<81fa9bf3.04010...@posting.google.com>...
>
>>seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.03123...@posting.google.com>...
>>

[snip]

> With all due respect, what is your area of professional training? I
> mean, after reading your post I dare say that you are not only weak in
> biology, but statistics as well. Certainly your numbers and
> calculations are correct, but the logic behind your assumptions is
> extraordinarily fanciful. You sure wouldn't get away with such
> assumptions in any sort of peer reviewed medical journal or other
> statistically based science journal - that's for sure. Of course, you
> may have good success as a novelist . . .

Tsk, tsk... I thank you for the career advice. I'll keep it in mind,
should my current stint in computer science fall through. I wouldn't go
so far as to say that Monte-Carlo methods are my specialty, but I will
say that my own research and the research of half my colleagues would be
non-existent if they worked the way you think they do.

>>I'll try to address some of the mistakes you've made below, though I
>>doubt that I can do much to dispel your misconceptions. Much of my
>>reply will not even concern evolution in a real sense, since I wish to
>>highlight and address the mathematical errors that you are making.
>
>
> What you ended up doing is highlighting your misunderstanding of
> probability as it applies to this situation as well as your amazing
> faith in an extraordinary stacking of the deck which allows evolution
> to work as you envision it working. Certainly, if evolution is true
> then you must be correct in your views. However, if you are correct
> in your views as stated then it would not be evolution via mindless
> processes alone, but evolution via a brilliant intelligently designed
> stacking of the deck.

Exactly what views did I state, Sean? Other than that your calculations
are, to put it plainly, irrelevant. Not even wrong - just irrelevant.

Yes, the example I give below incredibly stacks the deck in my favor.
It ought to. It is what is called a "counter-example". It falsifies
the hypothesis that your "model" of evolution is correct. Now aren't
you glad you proposed something falsifiable?

>
>>>RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...
>>
>>
>>
>>>>It is even worse than that. Even random walks starting at random points
>>>>in N-dimensional space can, in theory, be used to sample the states
>>>>with a desired property X (such as Sean's "beneficial sequences"), even
>>>>if the number of such states is exponentially small compared to the
>>>>total state space size.
>>>
>>>This depends upon just how exponentially small the number of
>>>beneficial states is relative to the state space.
>>
>>No, it does not. If you take away anything from this discussion, it
>>has to be this: the relative number of beneficial states has virtually
>>no bearing on the amount of time a local search algorithm will need to
>>find such a state.
>
>
> LOL - You really don't have a clue how insane this statement is?

When you're done laughing, would you care to explain to me why it is
insane? Especially when I can construct examples (and, if you so wish,
give you examples of real-world problems) that show this statement is true?

>>The things that *would* matter are the
>>distribution of beneficial states through the state space, the types
>>of steps the local search is allowed to take (and the probabilities
>>associated with each step), and the starting point.
>
> This distribution of states has very little if anything to do with how
> much time it takes to find one of them on average. The starting point
> certainly is important to initial success, but it also has very little
> if anything to do with the average time needed to find more and more
> beneficial functions within that same level of complexity.

Except in every real example of a working Monte-Carlo procedure, where
the distribution and starting point have *everything* to do whether such
a procedure is successful or not.

> For
> example, if all the beneficial states were clustered together in one
> or two areas, the average starting point, if anything, would be
> farther way than if these states were distributed more evenly
> throughout the sequence space. So, this leaves the only really
> relevant factor - the types of steps and the number of steps per unit
> of time. That is the only really important factor in searching out
> the state space - on average.

*Sigh*. The problem is that the model *you* are proposing (one I think
is silly) is of a random on walk on a specific frozen sequence space
with beneficial sequences as points in that space. It does not deal
with an "average" distribution, and an "average" starting point, but
with one very specific distribution of beneficial sequences and one very
specific starting point. You cannot simply assume an "average"
distribution in the absence of background information: you have to find
out precisely the kind of distribution you are dealing with. And even
if you do find that the distribution is "stacked", it does not imply
that an intelligence was involved. The stacking could occur due to the
constraints imposed by the very definition of the problem: in the case
of evolutions, by the physical constraints governing the interactions
between the molecules involved in biological systems. In fact, why
would you expect that the regular and highly predictable physical laws
governing biochemical reactions would produce a random, "average"
distribution of "beneficial sequences"?

>>For an extreme
>>example, consider a space of strings consisting of length 1000, where
>>each position can be occupied by one of 10 possible characters.

Note, I wrote, "extereme example". My point was *not* invent a
distribution which makes it likely for evolutiuon to occur (this example
has about as much to do with evolution as ballet does with quantum
mechanics), but to show how inadequate your methods are.

>
> Ok. This would give you a state space of 10 to the power of 1000 or
> 1e1000. That is an absolutely enormous number.
>
>
>>Suppose there are only two beneficial strings: ABC........, and
>>BBC........ (where the dots correspond to the same characters). The
>>allowed transitions between states are point mutations, that are
>>equally probable for each position and each character from the
>>alphabet. Suppose, furthermore, that we start at the beneficial state
>>ABC. Then, the probability of a transition from ABC... to BBC... in a
>>single mutation 1/(10*1000) = 1/10000 (assuming self-loops - i.e.
>>mutations that do not alter the string, are allowed).
>
>
> You are good so far. But, you must ask yourself this question: What
> are the odds that out of a sequence space of 1e1000 the only two
> beneficial sequences with uniquely different functions will have a gap
> between them of only 1 in 10,000?

Mind-numbingly low. 1000*.9*.1^999, to be precise. But that is not the
point.

> The time required to cross this
> tiny gap would require a random walk of only 10,000 steps on average.
> For a decent sized population, this could be done in just one
> generation.

> Don't you see the problem with this little scenario of yours?
> Certainly this is a common mistake made by evolutionists, but it is
> none-the less a fallacy of logic. What you have done is assume that
> the density of beneficial states is unimportant to the problem of
> evolution since it is possible to have the beneficial states clustered
> around your starting point. But such a close proximity of beneficial
> states is highly unlikely. On average, the beneficial states will be
> more widely distributed throughout the sequence space.

On average, yes. But didn't you just say above that the distribution
of the sequences is irrelevant? That all that matters is "ratio" of
beneficial sequences? (Incidentally, "ratio" and "density" are not
identical. The distribution I showed you has a relatively high density
of beneficial sequences, despite a low ratio.)

> For example, say that there are 10 beneficial sequences in this
> sequence space of 1e1000. Now say one of these 10 beneficial
> sequences just happens to be one change away from your starting point
> and so the gap is only a random walk of 10,000 steps as you calculated
> above. However, on average, how long will it take to find any one of
> the other 9 beneficial states? That is the real question. You rest
> your faith in evolution on this inane notion that all of these states
> will be clustered around your starting point. If they were, that
> certainly would be a fabulous stroke of luck - like it was *designed*
> that way. But, in real life, outside of intelligent design, such
> strokes of luck are so remote as to be impossible for all practical
> purposes. On average we would expect that the other nine sequences
> would be separated from each other and our starting point by around
> 1e999 random walk steps/mutations (i.e., on average it is reasonable
> to expect there to be around 999 differences between each of the 10
> beneficial sequences). So, even if a starting sequence did happen to
> be so extraordinarily lucky to be just one positional change away from
> one of the "winning" sequences, the odds are that this luck will not
> hold up as well in the evolution of any of the other 9 "winning"
> sequences this side of a practical eternity of time.

Unless, of course, it follows from the properties of the problem that
the other 9 benefecial sequences must be close to the starting sequence.

> Real time experiments support this position rather nicely. For
> example, a recent and very interesting paper was published by Lenski
> et. al., entitled, "The Evolutionary Origin of Complex Features" in
> the 2003 May issue of Nature. In this particular experiment the
> researchers studied 50 different populations, or genomes, of 3,600
> individuals. Each individual began with 50 lines of code and no
> ability to perform "logic operations". Those that evolved the ability
> to perform logic operations were rewarded, and the rewards were larger
> for operations that were "more complex". After only15,873 generations,
> 23 of the genomes yielded descendants capable of carrying out the most
> complex logic operation: taking two inputs and determining if they are
> equivalent (the "EQU" function).

I've already covered how you've completely misinterpreted Lenski's
research in the other post. But let's run with this for a bit:

> In principle, 16 mutations (recombinations) coupled with the three
> instructions that were present in the original digital ancestor could
> have combined to produce an organism that was able to perform the
> complex equivalence operation. According to the researcher themselves,
> "Given the ancestral genome of length 50 and 26 possible instructions
> at each site, there are ~5.6 x 10e70 genotypes [sequence space]; and
> even this number underestimates the genotypic space because length
> evolves."
>
> Of course this sequence space was overcome in smaller steps. The
> researchers arbitrarily defined 6 other sequences as beneficial (NAND,
> AND, OR, NOR, XOR, and NOT functions).

As a minor quibble, I believe they actually started with NAND (you need
it for all the other functions). But I could be wrong - I've read that
paper months ago.

And after years of painstaking research, Sean finally invents the wheel.
Yes, evolution does not pop complex systems out of thin air, but
constructs through integration and co-optation of simpler functional
components. Move along, folks, nothing to see here!

> Isn't that just fascinating? When the intermediate stepping stone
> functions were removed, the neutral gap that was created successfully
> blocked the evolution of the EQU function, which happened *not* to be
> right next door to their starting point. Of course, this is only to
> be expected based on statistical averages that go strongly against the
> notion that very many possible starting points would just happen to be
> very close to an EQU functional sequence in such a vast sequence
> space.

Here's a question for you. There were only 5 beneficial functions in
that big old sequence space of yours. They are all very standard
Boolean functions: in no way were they specifically designed by Lenski
et. al. to ease the way to into evolving the EQ functions. How come
they were all sufficiently close in sequence space to one another, when
according to you such a thing is so highly improbable?

> Now, isn't this consistent with my predictions? This experiment was
> successful because the intelligent designers were capable to defining
> what sequences were "beneficial" for their evolving "organisms." If
> enough sequences are defined as beneficial and they are placed in just
> the right way, with the right number of spaces between them, then
> certainly such a high ratio will result in rapid evolution - as we saw
> here. However, when neutral non-defined gaps are present, they are a
> real problem for evolution. In this case, a gap of just 16 neutral
> mutations effectively blocked the evolution of the EQU function.

You are not even close. Lenski et. al. didn't define which *sequences*
were "beneficial". They didn't even design functions to serve
specifically as stepping stones in the evolutionary pathways of EQ.
What they have done is to name some functions of intermediate complexity
that might be beneficial to the organism. They certainly did not tell
their program how to reach these functions, or what the systems
performing these functions might look like, but simply indicated that
there are functions at varying levels of complexity that might be useful
to an organism in its environment. Thus, they have demonstrated exactly
what they set out to: that in evolution, complex functional features are
acquired through co-optation and modification of simpler ones.

> http://naturalselection.0catch.com/Files/computerevolution.html

Thanks, but when I'm in the mood for a laugh, I prefer The Onion,
talk.origins feedback pages, or Fox News. :)

>> Thus, a random
>>walk that restarts each time after the first step (or alternatively, a
>>random walk performed by a large population of sequences, each
>>starting at state ABC...) is expected to explore, on average, 10000
>>states before finding the next beneficial sequence.
>
>
> Yes, but you are failing to consider the likelihood that your "winning
> sequence" will in fact be within these 10,000 steps on average.
>
>>Now, below, we
>>will apply your model to the same problem.
>
>
> Oh, I can hardly wait!
>
>
>>>It also depends
>>>upon how fast this space is searched through. For example, if the
>>>ratio of beneficial states to non-beneficial states is as high as say,
>>>1 in a 1e12, and if 1e9 states are searched each second, how long with
>>>it take, on average, to find a new beneficial state?
>>
>>OK. Let's take my example, instead, and apply your calculations.
>>There are only 2 beneficial sequences, out of the state space of
>>1e1000 sequences.
>
>
> Ok, I'm glad that you at least realize the size of the state space.

Yes, Sean, because your statistical argument is so-oooo sophisticated
that we simple folk can't keep up...

>
>>Since the ratio of beneficial sequences to
>>non-beneficial ones is (2/10^1000), if your "statistics" are correct,
>>then I should be exploring 10^1000/2 states, on average, before
>>finding the next beneficial state. That is a huge, huge, huge number.
>>So why does my very simple random walk explore only 10,000 states,
>>when the ratio of beneficial sequences is so small?
>
>
> Yes, that is the real question and the answer is very simple - You
> either got unbelievably lucky in the positioning of your start point
> or your "beneficial" sequences were clustered by intelligent design.

But, Sean, I don't understand! You were telling me just above that the
distribution doesn't matter at all! I am applying your very rigorous,
unquestionably correct method for computing the average number of states
examined (that should work regardless of distribution and starting
point), and it tells me I should be examining 10^1000/2 states on
average. So why on earth am I examining only 10,000? Is it just
remotely possible that the distribution, and *not* the ratio, might be
what is playing the deciding role?

Once you say "yes", then you and I can talk what an average distribution
will look like, and whether question of "average" is relevant or not.
But if you say "no", please tell me why your calculation fails so
miserably for my counter-example.

>>The answer is simple - the ratio of beneficial states does NOT matter!
>
>
> Yes it does. You are ignoring the highly unlikely nature of your
> scenario. Tell me, how often do you suppose your start point would
> just happen to be so close to the only other beneficial sequence in
> such a huge sequence space? Hmmmm? I find it just extraordinary that
> you would even suggest such a thing as "likely" with all sincerity of
> belief.

And have I done so? Though, now that you mention it, it may very well
be likely, and in fact even necessary, depending on the nature of the
problem we are examining. (And again, please remember that my toy
example has absolutely nothing to do with biological evolution - I am
just pointing out the general inadequacy of your methodology.)

> The ratio of beneficial to non-beneficial in your
> hypothetical scenario is absolutely miniscule and yet you still have
> this amazing faith that the starting point will most likely be close
> to the only other "winning" sequence in an absolutely enormous
> sequence space?! Your logic here is truly mysterious and your faith
> is most impressive. I'm sorry, but I just can't get into that boat
> with you. You are simply beyond me.

I am glad that I possess so much mystique in your mind's eye. :) But,
again, the purpose of my example was to blow a hole in your probability
calculations, rather than to present a workable scenario of evolution.
All I was trying to argue with this example is that your math needs a
lot of work.

>> All that matters is their distribution, and how well a particular
>>random walk is suited to explore this distribution.
>
>
> Again, you must consider the odds that your "distribution" will be so
> fortuitous as you seem to believe it will be. In fact, it has to be
> this fortuitous in order to work.

Again, I can present you with examples of real world problems where
these distributions just happen to be this fortuitious. If they
weren't, then Monte-Carlo methods would be useless in solving them.
Remember, these distributions don't arise at random, they follow
necessarily from the properties of the problem. So your arguments about
"averages" don't apply here.

> It basically has to be a set up for
> success. The deck must be stacked in an extraordinary way in your
> favor in order for your position to be tenable. If such a stacked
> deck happened at your table in Las Vegas you would be asked to leave
> the casino in short order or be arrested for "cheating" by intelligent
> design since such deck stacking only happens via intelligent design.
> Mindless processes cannot stack the deck like this. It is
> statistically impossible - for all practical purposes.
>
>>(Again, it is a
>>gross, meaningless over-simplification to model evolution as a random
>>walk over a frozen N-dimensional sequence space, but my point is that
>>your calculations are wrong even for that relatively simple model.)
>
>
> Come now Robin - who is trying to stack the deck artificially in their
> own favor here? My calculations are not based on the assumption of a
> stacked deck like your calculations are, but upon a more likely
> distribution of beneficial sequences in sequence space. The fact of
> the matter is that sequence space does indeed contain vastly more
> absolutely non-beneficial sequences than it does those that are even
> remotely beneficial.

Yes, but your caclulations are based on the equally unfounded assumption
that the deck is not stacked in any way, shape, or form. (That is, if
the sequences were really distributed evenly in your frozen sequence
space, then your probability calculation would still be off, but not by
too much.) What makes you think that the laws of physics do not stack
the deck sufficiently to make evolution possible? You may feel that
they can't: but in the meantime, you should be striving to find out what
the actual distribution is, rather than assuming it is unstacked. (Not
that this would make your model relevant, but it'll be a small step in
the right direction.)

> In fact, there is an entire theory called the
> "Neutral Theory of Evolution". Of all mutations that occur in every
> generation in say, humans (around 200 to 300 per generation), the
> large majority of them are completely "neutral" and those few that are
> functional are almost always detrimental. This ratio of beneficial to
> non-beneficial is truly small and gets exponentially smaller with each
> step up the ladder of specified functional complexity. Truly,
> evolution gets into very deep weeds very quickly beyond the lowest
> levels of functional/informational complexity.

The fact that the vast majority of mutations are neutral does not imply
that there exists any point where there is no opportunity for a
beneficial mutation. And where such an opportunity presents itself,
evolution will eventually find it, given large enough populations and
sufficient times.

>>>It will take
>>>just over 1,000 seconds - a bit less than 20 minutes on average. But,
>>>what happens if at higher levels of functional complexity the density
>>>of beneficial functions decreases exponentially with each step up the
>>>ladder? The rate of search stays the same, but the junk sequences
>>>increase exponentially and so the time required to find the rarer and
>>>rarer beneficial states also increases exponentially.
>>
>>The above is only true if you use the following search algorithm:
>>
>> 1. Generate a completely random N-character sequence
>> 2. If the sequence is beneficial, say "OK";
>> Otherwise, go to step 1.
>
>
> Actually the above is also true if you start with a likely starting
> point. A likely starting point will be an average distance away from
> the next closest beneficial sequence. A random mutation to a sequence
> that does not find the new beneficial sequence will not be selectable
> as advantageous and a random walk will begin.

Actually, your last paragraph will be approximately true only if all
your "beneficial" points are uniformly spread out through your sequence
space. Even then, you probability calculation will be off by some
orders of magnitude, since you will actually need to apply combinatorial
forumlas to compute these probabilities correctly. But, I suppose,
it'll be close enough.

>
>>For an alphabet of size S, where only k characters are "beneficial"
>>for each position, the above search algorithm will indeed need to explore
>>exponentially many states in N (on average, (S/k)^N), before finding a
>>beneficial state. But, this analysis applies only to the above search
>>algorithm - an exteremely naive approach that resembles nothing that
>>is going on in nature.
>
>
> Oh really? How do you propose that nature gets around this problem?
> How does nature stack the deck so that its starting point is so close
> to all the beneficial sequences that otherwise have such a low density
> in sequence space?

OK, Sean. Pause for a second. Do you really believe that evolution
works by repeatedly generating random long nucleotide sequences *de
novo*? Yes or no? That is the algorithm I was describing above.

>>The above algorithm isn't even a random walk
>>per se, since random walks make local modifications to the current
>>state, rather than generate entire states anew.
>
>
> The random walk I am talking about does indeed make local
> modifications to a current sequence. However, if you want to get from
> the type of function produced by one state to a new type of function
> produced by a different state/sequence, you will need to eventually
> leave your first state and move onto the next across whatever neutral
> gap there might be in the way.

If any. Depending on the distribution of states in sequence space, none
may exist.

> If a new function requires a sequence
> that does not happen to be as fortuitously close to your starting
> sequence as you like to imagine, then you might be in just a bit of a
> pickle. Please though, do explain to me how it is so easy to get from
> your current state, one random walk step at a time, to a new state
> with a new type of function when the density of beneficial sequences
> of the new type of function are extraordinarily infinitesimal?

Because the *density* need not be infinitesimal. Locally, the density
can be quite high. Again, what exactly, is your argument against the
idea that all the beneficial sequences observed in nature would be
necessarily clustered relatively close together in sequence space, as
required by biochemicastry? Or do you have a compelling argument that
this distribution should be purely random?

>>A random walk
>>starting at a given beneficial sequence, and allowing certain
>>transitions from one sequence to another, would require a completely
>>different type of analysis. In the analyses of most such search
>>algorithms, the "ratio" of beneficial sequences would be irrelevant -
>>it is their *distribution* that would determine how well such an
>>algorithm would perform.
>
>
> The most likely distribution of beneficial sequences is determined by
> their density/ratio. You cannot simply assume that the deck will be
> so fantastically stacked in the favor of your neat little evolutionary
> scenario. I mean really, if the deck was stacked like this with lots
> of beneficial sequences neatly clustered around your starting point,
> evolution would happen very quickly. Of course, there have been those
> who propose the "Baby Bear Hypothesis". That is, the clustering is
> "just right" so that the theory of evolution works. That is the best
> you can hope for.

Or it would be, if I thought the model you propose was even remotely
realistic. But, here are a few hints for you: 1) the "sequence space"
does not have a fixed dimension; 2) the mutli-dimensional fitness
landscape changes over time, partially as the result of evolutionary
processes. If your statistics are inadequate even for your very simple
model, how can you expect them to be even remotely relevant for a
problem that is much, much more complicated?

> Against all odds the deck was stacked just right so
> that we can still believe in evolution. Well, if this were the case
> then it would still be evolution by design. Mindless processes just
> can't stack the deck like you are proposing.

Really, now? I would think that processes operating with predictable
regularity might be able to. Such as, say, the laws of physics.
Remember: "mindless" != "random".

>
>>My example above demonstrates a problem
>>where the ratio of beneficial states is exteremely tiny, yet the
>>search finds a new beneficial state relatively quickly.
>
>
> Yes - because you stacked the deck in your favor via deliberate
> design. You did not even try to explain the likelihood of this
> scenario in real life. How do you propose that this is even a remote
> reflection of what mindless processes are capable of? I'm talking
> average probabilities here while you are talking about extraordinarily
> unlikely scenarios that are basically impossible outside of deliberate
> design.

And I am saying average probabilities do not apply when you are
concerned with determining one particular distribution. And as my
example shows, the distribution is all that matters. Find the
distribution, if you can, and then we'll talk.

>> I could also
>>very easily construct an example where the ratio is nearly one, yet a
>>random walk starting at a given beneficial sequence would stall with a
>>very high probability.
>
>
> Oh really? You can construct a scenario where all sequences are
> beneficial and yet evolution cannot evolve a new one? Come on now . .
> . now you're just being silly. But I certainly would like to see you
> try and set up such a scenario. I think it would be most
> entertaining.

I didn't say all sequences are beneficial, Sean. That *would* be silly.
I did say that the ratio *approaches* one, but is not quite that.
But, here you are:

Same "sequence space" as before, but now a sequence is "beneficial" if
it is AAAAAAAAAA......AAA (all A's), or it differs from AAAAA...AAA by
at least 2 amino acids. All other sequences are *harmful* - if the
random walk ever stumbles onto one, it will die off, and will need to
return to its starting point. (This means there are exactly 1000*9 +
(1000*999/2)*81 or about 4.02e6 harmful sequences, and 1e1000-4.02e6 or
about 1e1000 beneficial sequences: that is, virtually every sequence is
beneficial.) Again, the allowed transitions are point mutations, and
the starting point is none other AAAAAAA...AAA. Now, will this random
walk ever find another beneficial sequence?

What does this have to do with evolution? Nothing. But everything to
do with how a distribution can effect a random walk.

>>In other words, Sean, your calculations are
>>irrelevant for the kind of problem you are trying to analyze.
>
>
> Only if you want to bury your head in the sand and force yourself to
> believe in the fairytale scenarios that you are trying to float.
>
>>If you
>>wish to model evolution as a random walk of point mutations on a
>>frozen N-dimensional sequence space, you will need to apply a totally
>>different statististical analysis: one that takes into account the
>>distributions of known "beneficial" sequences in sequence space. And
>>then I'll tell you why that model too is so wrong as to be totally
>>irrelevant.
>
>
> And if you wish to model evolution as a walk between tight clusters of
> beneficial sequences in an otherwise extraordinarily low density
> sequence space, then I have some oceanfront property in Arizona to
> sell you at a great price.

If I did wish to model evolution this way, then I would gladly buy this
property off your hands. And then sell it back to you at twice the
price, because it would still be better than the model you propose.

> Until then, this is all I have time for today.
>
>
>
>

> Sean
> www.naturalselection.0catch.com

Cheers,
RobinGoodfellow.

Dunk

unread,
Jan 4, 2004, 9:28:59 AM1/4/04
to


Major kudos to Von Smith for his ** format ** as well as content.

The intercalated style of argumentative reply nearly doubles in length
with each reply. This is unnecessary, since essentially the same
argument recurs, ah, more than twice.

Von Smith's 'summary of the main argument' format is very
constructive, and likely to get somewhere sooner.

Dunk

Howard Hershey

unread,
Jan 4, 2004, 10:12:07 AM1/4/04
to

Sean Pitman wrote:
>
> Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF4957C...@indiana.edu>...
>
> > 1) You seem to agree that the native ebga does not have any selectable
> > lactase activity. Thus generating selectable lactase activity from ebg
> > is generating a 'new' function. Is that right?
>
> Yes.
>
> > 2) You do agree that the native ebg involves a two peptide system, with
> > ebga being 1030 amino acids long and ebgc being 149 amino acids long,
> > both being requried for function, plus a regulatory protein (ebgr) which
> > is also around 1000 amino acids long.
>
> At this point it might be helpful to consider that the usual wild type
> lacZ genes in E. coli produce a tetramer beta-galactosidase. Each
> subunit of this tetramer is around 1000aa in size.

The subunits of the alpha and beta units of ebg are 1030 and 149 amino
acids. It also forms a multimeric enzyme. Any statement about the
complexity of the standard w.t. lacZ gene also holds for ebg, doesn't
it? Aren't they at the same level of complexity, with perhaps a nod to
ebg (because it is slightly longer and composed of two peptides)?

> However, this is
> not the minimum size requirement for this type of function to be
> realized at a beneficial level of selectability. The minimum size
> requirement seems to be well over 400aa.

So you keep asserting without telling anyone how you poofed that number
out of thin air.

> Considering that 12 to 14 of
> 15 active site residues are identical between LacZ and ebgA, I would
> also think that the minimum sequence requirements would also be
> similar (i.e., somewhere around 400aa). Also, it is interesting to
> note that the ebgC sequence has none of the active site residues and
> yet it seems to be essential, as you noted yourself, for the lactase
> function.

This would not be surprizing to anyone with a knowledge of how the alpha
peptide of the lacZ gene can complement a non-functional lacZ missing
the alpha peptide region. BTW, the alpha peptide complementation of
lacZ is the basis of many of those blue/white tests to identify bacteria
with plasmids having DNA inserts.

> It appears that this small subunit is essential for the
> optimal operation of electrophilic catalysis by the active-site Mg^2+.
> Also note that a correct mutation in either the ebgR or the ebgA
> genes alone will allow selectably advantageous lactase ability. Of
> course both mutations occurring at the same time allow for a much
> stronger lactase function, but both mutations are not required before
> selectable lactase function can be realized. It is known that the
> mutation in ebgR arises first that allows the cells to grow very
> slowly on lactulose. The second mutation (in the ebgA gene) then
> arises and allows the double mutants to grow very rapidly.

Yes. Knowing the minimum sequence requirements of a lactase is
completely irrelevant to calculating the organism's ability to evolve
lactase activity. What matters is how many mutations away from
selectable lactase activity some actual gene in the organism is.


>
> http://www.biochemj.org/bj/325/0117/3250117.pdf
> http://www.science.siu.edu/microbiology/micr460/460%20Pages/460.SPAM.html
>
> > 3) You seem to think that knowing the total length of the proteins
> > involved (in this case, about 1200 for the two that act together at the
> > same time) and how many proteins are involved in the system (2) allows
> > you to determine the number of amino acids that are 'fairly specified'
> > and the 'level of complexity'.
>
> As I have said many many times before, I am interested in knowing the
> *minimum* number and specificity of amino acids required to achieve a
> particular type of function.

Well, so am I. Not that knowing that number is relevant to evolution.
But I am interested in how you went about calculating it.

[At the very end of this long post, I find out that the minimum number
is not, in fact, the minimum number. It is an *estimate* of some
unspecified sort based on the smallest *known* sequence with that
particular activity. Why it is an *estimate* rather than the actual
number of amino acids in smallest *known* sequence with that activity is
not addressed. Nor is any mention made of what this smallest known
sequence is. Apparently no effort at all was made to determine whether
or not the smallest *known* sequence has any relationship at all to the
*minimum* requirements for total amino acid length nor what fraction of
even the smallest *known* sequence is 'fairly specified' (which remains
undefined). The possiblility remains that the number was pulled out of
the ether.]

> I dare say that the 1200aa normally used
> in this case are not all needed are and are not all that constrained.
> As explained already, a more likely minimum number of required amino
> acids is probably somewhere around 400 relatively loosely specified
> amino acids.

And, as I have repeatedly asked, both politely and not, "HOW THE F**K
DID YOU ARRIVE AT THAT NUMBER?" It seems to me that all you did was
wave your hands and poof the number out of thin air. You certainly have
been repeatedly asked to justify that number and all you have done is
wave your hands and poof it out of thin air again.

> > Please perform this mathematics on the
> > ebg system for me. If you cannot calculate "the total number of amino
> > acids required for a particular type of function to be realized at its
> > most minimum beneficial level of function" for a simple system like ebg,
> > what makes you think you can do so for a larger or more complex one?
>
> But I can. I suggest to you that the type of function produced by the
> ebg system has a very similar minimum size requirement and positional
> constraint limits as do other lactase genes/systems which seem to have
> a minimum requirement of somewhere over 400 relatively loosely
> specified amino acids.
>
> Now, you can easily prove me wrong here by finding a functional
> lactase enzyme that requires less than 400aa. Do you know of such a
> lactase that actually works to some selectable advantage in any living
> thing?

ebg is NOT a lactase. That is important for you to remember. ebg is
NOT a lactase. Before mutation it does not have selectable lactase
activity. You yourself said this was true. Repeat after me: ebg is
NOT a lactase. ebg is NOT a lactase. But I certainly agree that it is
at least as complex as the lacZ gene product and probably just as
complex as any other protein of the same length without internal repeats
if you actually have a way of measuring complexity.

> > 4) After calculating "the total number of amino acids required for a
> > particular type of function to be realized at its most minimum
> > beneficial level of function" (you claim to be able to do so -- at least
> > I have seen you give estimates of around 480 or so for ebg, but it would
> > certainly be nice to see what went into the calculation)
>
> This calculation is based my own database search and the searches of
> others that suggest that there are no functional lactase enzymes
> smaller than 400aa.

Are all 400 aa completely constrained in the 400 aa sequence? How is
this a valid estimate of the *minimum* number of amino acids needed for
lactase activity rather than just the smallest known number? I think
this is an absurd way of determining the minimal requirements of a
lactase. Absurd both because most lactases are not independently
designed minimal structures, but are evolved from past lactases and also
because it doesn't take any knowledge about how enzymes work into
account. I agree that a certain minimum number of amino acids are
needed to form the 3-D structure needed to bring the relevant amino
acids that *independently* bind the substrate optimally (not too tight,
not too loosely) and the other amino acids that provide the nucleophile
for the hydrolysis into the proper position.

There are only a few amino acids that are actually crucially involved.
A few amino acids that determine substrate specificity -- in this case,
specifically binding a galactose sugar moiety and its glycoside linkage
and another few amino acids that provide the nucleophile that lowers the
energy of activation of hydrolysis of that linkage. Most of the amino
acids in the protein that have any other function are involved in
ensuring that those few are at the right position a sufficient amount of
the time and are flexible enough to bind the substrate and release it
after hydrolysis. That is, most of the amino acids beyond a very few
are involved in oragami. Now, and here is the important part, *any*
amino acid sequence that provides the proper orgami so that the crucial
few amino acids are in the proper position will have lactase activity.
There are many ways to provide that 3-D structure, some using many more
than 400 amino acids, some using 400 acids, there may even be some that
use significantly less than 400 amino acids. But we only know the
lactases which *nature* found, not all possible nor, certainly, *the*
minimal lactase. In fact, nature sticks primarily with what works and
first-come, first-serve. Nature is extraordinarily wasteful in pursuit
of reproductive success, efficiency is not evident in biochemistry, and
the panda's thumb shows that first-come solutions tend to prevent even
better ones. [This is also the reason why HbS/HbA resistance to malaria
took root much more strongly than HbC/HbC resistance.] Nature has
little reason to find *the* minimal-sized lactase. Now, it may well be
that a particular type of structure may be needed to form a cleft of the
proper size and hydrophobicity to be useful for enzymatic activity.
This can be evidenced by certain deletion or insertion mutations. Up to
a point, insertions and deletions in those other parts of a protein
structure that are flexible bends has little selective effect (are
selectively neutral).

> So, it seems like the ~400aa level is the best
> "minimum" requirement that the evidence available to me so far
> supports. If you think otherwise, please do present this evidence.

I think that searching for a minimum *existing* size by looking through
the sequences of *evolved* enzymes is a poor way to find the bare
minimum total size and also will tell you little or nothing about how
many of the amino acids *within* that minimum size are 'fairly
constrained' (which remains a nebulous undefined concept).


>
> >I want you to
> > calculate the odds of ebg evolving into a selectable beta-galactosidase
> > enzyme *based solely on these numbers* and NOT based on any other
> > knowledge. This estimate of the odds of generating functional lactase
> > activity from ebg would be called a 'prediction' of *your* 'hypothesis'.
>
> The odds are extremely good that the wild-type ebg sequence will
> evolve into a selectable beta-galactosidase in short order (one or two
> generations) since it is only a single positional change away from
> success, but that is not the important question.

Yes, it is. Your *claim* is that one can, by looking *only* at the
minimum amino acid size needed for a function, declare *from that
knowledge alone* how easy or impossible it is to generate a particular
function. That is exactly what you do with the bacterial flagella. You
calculate the minimum amino acid size (by bogus methodology or not is
for later analysis) and then *declare* and assert that because the
minmum amino acid number is 6000 or 1000 or 400 or whatever, it is
impossible for a bacterial flagella to evolve from *any* precursor
state. Using that same logic on ebg, which, as you point out, has NO
selectable lactase activity, and which also needs, as you point out, 400
minimum amino acids to have lactase activity, you should NOT be able to
generate any lactase activity from ebg by single mutations if you apply
your mathematical model of evolution to ebg. The reality is, of course,
that you can evolve lactase activity relatively easily. The reason is
that ebg is not some *random* protein or *random* sequence. Yet you
insist that you can still tell how hard it is to evolve a function based
solely on the minimal number needed for that function.

> My idea doesn't look
> at sequences so much as it looks at types of functions. What are the
> odds that a particular organism or group of organisms will have
> anything within their collective genomes that is close enough to
> evolve any type of new beneficial function within a given level of
> specified complexity? That is the important question.

And the answer is highly variable. Remember that ebg is NOT a lactase.
It does, however, have a glycoside binding site that can be converted to
binding galactoside linkages (no change is needed to hydrolyse that
linkage). And I would expect most bacteria to have *some* enzymes that
perform glycoside hydrolysis. Now, a particular bacteria may NOT be
able to evolve a lactase from every one of its glycoside hydrolases
because the current function of that enzyme may be more important than
the gain obtained by having lactase activity. That is why duplication
of an existing enzyme is often the first step in evolution. But I
certainly would expect lactose to evolve from a pre-existing glycoside
hydrolytic enzyme rather than from some *random* protein or *random*
sequence, which is what the math of your model repeatedly proposes
(although you deny it just as often).


>
> Given this question, it is very interesting that the E. coli bacterial
> species seems to have a "spare tire" lactase gene that is just one
> mutation away from success.

Why is it amazing to you that a bacteria has other glycoside hydrolases?
It is interesting that the ebg hydrolase activity is dispensible in
conditions where lactase activity is crucial to survival. That is,
duplication was not a prerequisite here. But that is about all that is
interesting. Ebg is not present in E. coli to be a "spare tire"
lactase. If it were, it would have retained lactase activity. It has a
related function, but that function is not lactase activity.

> This would not be such an interesting
> finding if lactases where less specified than they are. For example
> if the density of lactase sequences in 400aa level of sequence space
> were say as high as 1 in a billion, the average gap between lactases
> would be less than 7 mutations wide.

Please show your math. This is hand-waving numerology. And irrelevant
because no one (but you, and you deny it) is arguing that evolution
works by starting with a *random* 400 aa.

> For a colony of bacteria
> numbering say 10 billion individuals, this gap would be crossed in no
> more than several months by all types of bacteria. What is
> interesting though is the very "limited evolutionary potential" that
> many types of bacteria have when it comes to the evolution of this
> relatively simple enzymatic function. Without their spare tire gene,
> E. coli cannot evolve this lactase function despite very positive
> selection pressure, artificially elevated mutation rates, and tens of
> thousands of generations of time. Many other types of bacteria have
> not been able to evolve this relatively simple lactase function
> despite well over a million generations of documented observation.
>
> So, what does this mean? It means that the density of sequences with
> the lactase function is actually quite low. This low density is what
> limits the evolutionary potential of many organisms that would
> otherwise benefit from a lactase enzyme if they were able to evolve
> one. The fact that they do not evolve one means that the gap between
> what they have and the nearest lactase enzyme is simply more than a
> dozen fairly specified mutations away.
>
> > 5) Now let's take a different protein or protein system, also 1200 total
> > amino acids in length. We will make it a bit less complex, by making it
> > a single unregulated protein.
>
> The fact that a protein operates as a single unit does not make it
> less complex than a multiprotein function that requires the same
> minimum amino acid number and level of specificity.

Yet it seems that only really large multiprotein complex systems seem to
reach the state of unevolvability. Really large single proteins never
do.

> Also, all protein
> functions are regulated in one form or another.
>
> > But this protein is in the histidine
> > pathway. That is, it is a random protein wrt lactase function, chosen
> > merely because of total amino acids present. Let's say that for *its
> > function* the very same "total number of amino acids required for a
> > particular type of function to be realized at its most minimum
> > beneficial level of function" exists. I want you to calculate the odds
> > of this protein evolving into a selectable beta-galactosidase enzyme
> > *based solely on the numbers you thik are important* and NOT based on
> > any other knowledge about this protein.
>
> Starting with a random sequence of 1,200aa acting in some beneficial
> manner, you are asking how long it would take to evolve a
> beta-galactosidase? Is that what you are asking?

That is what *you* repeatedly assume, in your mathematical analysis.
But I did not say it was a random sequence. I said it was a protein
which had its own function, but that function was completely unrelated
to glycoside hydrolysis. That is, unlike ebg, it has no glycoside
binding site and no active site for the hydrolysis of glycoside
linkages. That is what your entire calculation is based on. Starting
with either a random protein or a random sequence of 400 aa and evolving
lactase activity from that starting point.

> If so, then say the
> density of lactases in sequence space of 400aa minimum was low enough
> to require 24 specified mutations, on average, to go from one lactase
> island to another. If true, then, on average, a sequence of 400aa in
> a given gene pool would be around 12 specified mutations away from the
> closest lactase sequence creating a gap of 4,000 trillion non-lactase
> sequences. Say the colony size is 1 trillion individuals living in a
> steady state and the mutation rate is one mutation per 400aa per year
> per individual lineage (an pretty high mutation rate). Well, starting
> with 1,200aa in a colony of 1 trillion would give us 3 trillion
> sequences of 400aa each evolving at the same time (given that this
> 1,200aa sequence was released from selective constraints perhaps via
> gene duplication). This means that each year 3 trillion sequences out
> of 4,000 trillion will be searched out. At this rate, on average,
> success will be realized in just over 1,300 years on average (defined
> as the evolution of a beneficial lactase function in one member of the
> population).

IOW, significantly more than the five years you allow in
experimentation. And significantly more than were required if the
protein were a non-random protein like ebg. Despite both ebg and this
protein having a 400 aa minimum to have lactase activity.



> > 6) Same thing, except now we have a completely random sequence of 1200
> > total amino acids.
> >
> > I will accept failure to evolve a selectable beta-galactosidase activity
> > in five years as evidence that your math is correct for *that type of
> > protein* (even though I really should wait a gazillion years, just to be sure).
>
> Ok, what is your counter argument?

My point is that neither of the last two are proposed evolutionary
models. The first one, where evolution procedes by modification of a
pre-existing enzyme with related function, is doable in real time. The
first model produces lactase with one or two mutational steps. Your
models, which assume one or the other of the last two cases, would take
many more than five years. The last two represent the straw man
evolution your math assumes. Reality does not.

> If the density of lactases in
> sequence space of 400aa was very much less than what I based my above
> calculations on, then why were Hall's E. coli so limited in their
> ability to evolve a type of function with such a high density of
> sequences in sequence space?

E. coli has a *specific* set of completely *non-random* sequences in
sequence space. What matters, and the only thing that matters, is that
it had at least one sequence which *was* able to *rapidly* evolve
lactase activity. Part of this may be a consequence of the fortuitous
fact that ebg' original function is a dispensible function when
selection is strong for lactase activity. Otherwise, one would probably
need the fortuitous duplication of another glycoside hydrolase and its
subsequent modification.

> > That is because your bogus math does not take the specific ancestral
> > sequence and its pre-existing functionality into consideration.
>
> Actually it does. No matter what you start with you cannot get around
> the fact that on average your starting points will be a certain
> distance from new sequences with new types of functions. This
> distance gets exponentially larger, no matter what your starting
> sequences are, at higher and higher levels of specified complexity.

And how does one measure levels of specified complexity? Is not ebg
just as (or more) complex as the putative histone synthetic enzyme of
the same length in the discussion above? Yet the putative histone
synthetic enzyme was many more changes away from lactase activity than
ebg. It was about as far away as a random sequence would be. How much
specified complexity does a random sequence of that length have? How do
you calculate levels of specified complexity and what does it have to do
with evolutionary mechanisms?

> For example, lets just say, by a sheer extraordinary stroke of luck
> that an ancestral sequence in a bacterial colony just happened to be
> one or two mutations away from a new type of function as specified and
> complex as a flagellar motility system. Well, of course this highly
> complex system would evolve in short order now wouldn't it? Ok, but
> how many more such systems would it be able to evolve on average?

First, it wouldn't be sheer extraordinary luck for there to be several
to many functioning glycoside hydrolases in a bacteria. It would be a
matter of luck (but not sheer extraordinary luck) for one of these to be
only a few mutations away from being modified to a somewhat different
function (say improved ability to bind a formerly weakly bound secondary
substrate). Evolution is not teleologic in nature. Systems evolve by
tinkering with what exists, not by direct synthesis of a system from
design or from scratch (some random protein or random sequence). If
nothing exists for the tinkering to start with, and no intermediate
stages of functionality exist, it won't happen.

> How
> long would it take that colony to evolve another type of function at
> that same level of complexity or higher given what it now has to
> proceed with?

ebg has no lactase activity. It, therefore, evolved another type of
function at the same level of complexity (400 aa, according to you) as
lacZ. This occurred by a single mutation to generate selectable
function despite the fact that there is only 25% or so sequence identity
between the two proteins (and that is not much larger than chance
alone), and this initial selectable level of function was improvable and
improved on by a couple of subsequent mutations. Much of evolution
involved changes of function in families of genes which all have related
functionality just like this. This is evidenced by sequence identity
and structural and functional similarity of the existing genes. The
evolution of lactase activity in ebg is of this type. But so is the
evolution of the globin genes of hemoglobin and their relationship to
myoglobin. And the evolution of flagellar motility from a previous
TTSS-like non-motile function also involved only a few changes in
sequence. Most of the 'functional' relationships (which protein binds
to which other protein) were unchanged in going from a TTSS to a
flagellar function. The proteins that compose the bacterial flagella
are not a random sample of proteins; they are a non-random sample of
proteins that are very similar to the proteins that form a TTSS system.
Other evolution involves chimera formation to generate new function.
Examples of this include the antifreeze gene of certain Arctic fish, and
possibly the changes in allosteric effectors of transmembrane signalling functions.

> Odds are that everything it has will be gazillions of
> years away from any other type of function within such a level of
> complexity or higher.

Compare the level of complexity within a TTSS system without and one
with motility of the 'whip/protein export tubule'. Those two systems
are not separated by a very large change in 'level of complexity' nor,
necessarily, by hundreds of amino acid changes. And it is the
*difference* between selectable steps that is important. No one is
claiming that flagella arose *as flagella* by evolving it directly from
20 random proteins. Flagella arose by a modification of a pre-existing
TTSS-like system. And that system itself did not contain random
proteins. It contained, say, 19 proteins that were as similar to 19
flagella proteins as ebg was to lacZ. Again, the *only* way your
calculation makes any sense at all is if you are assuming that the
starting point is some *random* or *average* protein or *random* or
*average* sequence and that the only possible function is the teleologic
one you declare.

> In fact, the odds are so great against
> evolution at such levels that the witnessing of evolution at such a
> level should cause one to seriously look into the almost certain
> finding of a pre-exiting system that had been lost for a time but
> who's code was still there pretty much intact.

Or a system that has a different current 'function', but which *can* be
modified, by tinkering around the edges, into a system with an emergent
function. Such as modifying a TTSS-like or other whip-like system to a
motility function (as has happened independently at least twice, in
eubacteria and archaebacteria). Or, if that pre-existing system is
unavailable, modifying a secretory system to a 'new' motility function.
Or if that pre-existing system is unavailable, modify an internal
transport mechanism involving microtubules into undulopodia. Or if that
pre-existing system is unavailable, modify a different internal system
of transport (actin/myosin-like) to generate amoeba-like pseudopodia.
Motility, and the need or utility of motility, is occasionaly a strong
selective force, and evolution tinkers with what exists to try to
generate a motility system. It doesn't start with random or average sequences.


>
> For example, cavefish who have lost their eyes still have the code to
> make eyes in their genome. It has been shown that a single point
> mutation can restore the production of fully formed eyes in the
> offspring of these fish. Does this mean that eye evolution has been
> demonstrated? Absolutely not.

It is a demonstration that one does not need to invent the wheel from
scratch and start with a random sequence.

> All this shows is that the evolution
> of such a highly complex system requires a pre-existing code for this
> system that has been shut down by a slight change to the system.

Yes, for the particular case you mention. But the existence of 'eyes'
of various intermediacy show that intermediate states to eye formation
can certainly have independent functional utility that are not thousands
of changes from one another. Indeed, even the chemistry of vision shows
two independent re-utilizations of a retained primitive step (the first
steps using cis-retinol) and then co-option of other, independent
pre-existing steps to generate the nerve signal. And the independent
evolution of eyes in squid and humans use these quite distinct biochemistries.

> Without this historical existence of sightedness in the ancestors of
> these fish, they would never have been able to evolve the ability to
> see.

For fish, certainly. But that is not the case for squid and fish.

> The same is true for flagellar motility. I know that you have
> suggested that the ability to mutate a flagellar system so that it no
> longer works as a motility system, just keeping its TTSS system
> intact, and then mutating back the motility function is an example of
> high complexity evolution in action.

No. I pointed it out as a trivial example, not one directly identical to
the steps involved in the evolution of the system. But just as there are
organisms with intermediate structures for vision that have independent
utility, the TTSS systems show that one does not need to have the full
dual functionalities of TTSS-like activity *and* motility in order to
have selectable activity. That is, the TTSS system can be selected for
its utility independent of the motility function. The same cannot be
said for the eye with no vision in cave fish -- it has gone from being a
structure selected for utility in vision to a vestige that has no
purpose wrt vision or anything else. It has no independent utility as a
vestige. But the pinhole eye of Nautilus does have selectable purpose.
The pit eyes of many invertebrates do have independent utility.
Cis-retinol/opsin can have selectable function even if disconnected to
the nervous system. But it certainly can have selectable function if it
interacts biochemically with the nervous system sending a signal about
the level of light received. And it does. Differently in vertebrates
and invertebrates, and not by making an entire new system from scratch,
but by interacting with a pre-existing pathway of nerve signalling
biochemistry.

> It really is nothing of the
> sort. It is on the same level as blind cavefish evolving their eyes
> back again.

No. The difference is that the TTSS system has an independent
selectable utility. The lost eye/vision in blind cavefish does not, but
the precursor to the vertebrate camera eye did -- it was useful for
vision, but not *as* useful as the camera eye.

> Without the pre-established code already being there and
> working at that type and level of functional complexity in the
> ancestors of that organism, such levels of complex function would not
> evolve in trillions upon trillions of years.

Have you read the Nilsson and Pegler paper?



> > The interesting thing, of course, is that this experiment (selection for
> > the evolution of galactosidase activity) actually has been run and your
> > model of calculating the odds has been tested. That is, the prediction
> > based on your methods of determining the odds of evolving a new function
> > has been subject to test. Did it pass the test for *all* of the
> > examples, or only for the examples where you start with a random protein
> > or a random sequence?
>
> You must understand that we are talking averages here. What is the
> average time required to evolve a new function at a particular level
> of complexity?

Evolution does not work via *averages* or *random* sequences. It works
by modifying pre-existing systems and structures and does so in a
step-wise fashion with frequent positions of indepedent selective
utility. One can no more calculate the time required to evolve a new
function from knowing only its particular level of complexity than one
can calculate the time it takes a car to reach Indianapolis by knowing
only the complexity of the make and model of the car. [Knowing how far
the car is from Indianapolis is much more useful in estimating the time
it will take it to reach that city than knowing the complexity of the
make and model. Similarly, knowing how many changes are needed in an
existing protein to get selectable activity is much more useful in
estimating the time it will take to get that selectable activity than
knowing the level of complexity of either that protein or the end result.]

> In other words, what is the density of beneficial
> functions at various levels of complexity in sequence space?

That calculation is utterly without relevance.

> You must
> have some sort of idea of the density of beneficial functions in order
> to be able to estimate average evolutionary time requirements.

Why? Such a number is devoid of utility.

> Certainly it was very fortunate that E. coli had at least one and
> possibly two sequences within striking distance of a lactase sequence,
> but this does not mean that the density of lactase functions can be
> adequately estimated based the division of the number of genes in E.
> coli by the number of lactase sequences in E. coli. This is a
> fallacy.

For E. coli, it seems that the fallacy is reality. For E. coli, reality
is all that matters. It cares not that it is far from *average* and did
not evolve its new lactase functions from *random* proteins or *random*
sequences.

> By this method it would seem that the density of lactases
> sequences is as high as 1 in 1000 amino acid sequences. This is
> obviously incorrect or a beneficial lactase would be no more than 3
> mutations away from any 400aa sequence.

Actually, lactase activity may be no more than 3 mutations away from
some other glycoside hydrolase sequence. And all that matters is that
E. coli is fairly rich in glycoside hydrolases, including one that is
directly dispensible in environments rich in lactose (rather than
requiring a duplication event). But think of ebg as being the equivalent
of TTSS. Both happen to pre-exist in some organisms for reasons that
have nothing to do with their potential to evolve into lactase or
flagella, respectively. The organisms that do not have these precursor
systems (say bacteria like, possibly, Chlamydia or some of the
rock-eating microbes), won't be able to evolve lactase (nor have much
need for it) or flagellar motility (although they might evolve a
different kind of motility, say by extruding polysaccharides). Your
calculation of 'averages' only has significance if you think that
systems evolve from *random* or *average* sequences. I sure don't.

> The evolution of lactase
> would be lightening fast at this density in all types of bacteria.

Only if all bacteria were to have the same or very similar glycoside
hydrolases. Such systems undoubtedly are more common in those bacteria
that are in environments where there are significant levels of
glycosides and disaccharides useful as carbon sources.

> The far more telling evidence is found in the limited ability of lacZ
> and ebg negative bacteria to evolve the lactase function over the
> course of tens of thousands of generations. That observation gives a
> much clearer idea about just how low the density of lactase sequences
> really is.

And is utterly irrelevant unless you think that evolution of lactase
*must* happen in a selective environment that would favor it. The
really important point is that lactase *can* evolve in *some* organism.
That it evolves in an organism by modifying another glycoside hydrolase
is hardly surprising. Obviously it must be fairly difficult to evolve
new function or we would not see the vertical transmission pattern we
do. That is, it is likely that the cytochrome c's we currently see are
not the consequence of independent creation of cytochrome c in chimps
and humans or fish or bacteria, but the consequence of neutral drift of
cytochrome c's throughout history from an ancestral cytochrome c. That
is, cytochrome c function evolved only once and all subsequent versions
are derived from that event, either transmitted vertically (after
incorporation of the ancestral mitochondrial symbiont) or transmitted
horizontally (between bacteria and from bacteria into eucaryotes). If
it were possible to evolve cytochrome c every time or any time one
needed one, we would not have the nested pattern of descent in sequences
that we do. OTOH, some systems probably did evolve independently, and I
suspect that human and bacterial lactases would be included in that,
given that the human lactase clearly has, at minimum, a chimeric origin
for the terminus that is transmembrane.



> > Remember that no one (except your argument) is proposing that bacterial
> > flagella arose in one fell swoop with no intermediate states of utility.
> > Quite the opposite.
>
> Obviously that is what you evolutionists fervently believe and is
> actually what is required.

Yes.

> You must have intermediate steppingstones
> that are each selectably beneficial.

Yes. But not necessarily for the current (teleologic) function.

> Your problem is that you don't
> have these stones. Your proposed evolutionary pathways for the
> flagellar motility system are sorely lacking involving huge gaps that
> have never been crossed.

For which ones do you have *evidence* that they are uncrossable? Be
specific. But remember that unless you can say that NO such
intermediates CAN exist, your calculation, based as it is on the minimum
requirements of all the proteins, is nonsense based on a bogus idea that
the only way for the flagella to arise is to start with 20 *random*
sequence or *random* proteins and procede by a *random* walk.

> Not even one of your proposed steps in the
> evolution of a flagellar motility system has been demonstrated to
> evolve in real life - not one. Come on now, where are these
> steppingstone functions?

TTSS activity is a stepping stone *function* that has obvious
independent selective utility. So are protein-export activities of
simpler structures obtained by removing some of the TTSS proteins. Such
simpler protein extrusion systems exist and are useful in bacteria. It
is quite likely that the initial motility function was secondary to the
TTSS-like function.

> They just aren't there because those types
> of functions at this level of specified complexity are so far away
> from all stepping stones that universes of sequences must be sorted
> through before any function at this level can be found.

The above is pure nonsense verbiage. Why can't TTSS activity be a
stepping stone function? And why can't simpler protein-export
structures be a stepping stone function to the TTSS function? The point
is that, to a very large extent, the *very same* structure can perform
*both* TTSS function and motility function. The sequences that perform
a TTSS function and the sequences that perform the motility function are
not "so far away" from each other that "universes of sequences must be
sorted through before" the *other* function (not just *any* function,
but the *other* function) "at this level can be found". In fact, one
can have TTSS function in a sequence which is *identical* to the
sequence that performs the motility function (bacterial flagella exhibit
both protein transport and motility functions). And one can have TTSS
function in which nearly all the proteins that perform this function
interact with each other in *exactly* the same way as the similar
proteins interact with each other in motile flagella (and they also have
significant sequence identity -- even more sequence identity than ebg
and lacZ), but which is missing, say, one of the proteins needed for
motility.

And yet, despite all this identity and similarity between TTSS and
motile flagella systems, you seem to be saying that all the stepping
stones between TTSS and flagella are missing and one has to sort through
a *universe* of sequences in order to modify the TTSS structures to a
new motility function? IOW, if you start at TTSS, evolution is supposed
to ignore the fact that the structures that form TTSS are essentially
identical to those that are involved in motile flagella and that they
interact with each other just like their flagellar counterparts. Nope.
The only way they can become part of mobile flagella is if they wander
through a universe of completely useless intermediates. It is just like
the fact that despite the 'function' and structure of ebg being not so
far removed from the structures required for lactase activity, evoluton
must sort through universes of completly random sequences to find one
with lactase activity if you start with ebg. Yeah. Sure. And I have
bridge I would like to sell you.

No. What degree of specificity is required for an amino acid to be
considered one of the 'fairly specified' ones? How is a 'fairly
specified' amino acid identified?


>
> You know, it would really help if you read the entire train of thought
> before you responded with long paragraphs with single words and
> sentences. I often answer many of your questions and statements in
> the very next sentence or paragraph, as I did this time.

No. You continue to be highly evasive and non-specific. Let me help.

One identifies an amino acid as 'fairly specified' by....

The way one identifies an amino acid as 'fairly specified' is ....


>
> > > Of these, 30 amino
> > > acid positions that are highly constrained - rarely involving more
> > > than one type of amino acid.
> >
> > Rarely is not never. How many are invariant?
>
> You are really hung up on this idea of absolute invariance. I would
> dare say that very few if any positions are absolutely invariant -
> taken one at a time.

How else can you take them? Ten at a time? One hundred at a time?
Exactly how are you determining 'constraint' and, from the
identification of 'constraint' the determination of 'fairly specified'?

> The same is true of a 100-character sentence.

Actually, no. Language is not like the amino acids of a protein in this
sense precisely because many more of the positions in language are and
must be invariant for particular words to have meaning in context.

> This, however, does not mean that this type of function is not highly
> constrained.

What 'function' are you talking about here?

> Character positions do not have to be absolutely
> invariant in order to be very highly constrained - right?

But what degree of variance do you consider to be 'very highly
constrained' and is this constraint at the amino acid level, the group
of ten amino acids, the group of one hundred? How are you determining
your numbers. All you keep doing is say there is constraint and out
pops this number (400, 480, 100, 80) out of nowhere. I keep asking you
how you determine these numbers. Are they WAGs? Where can I find these
numbers in the literature defined exactly the same way you do and for
the very same purposes?

> > It seems to me that you
> > count an amino acid as 'fairly specified' if it exhibits *any*
> > constraint and treat it as if it were invariant.
>
> Come on now! Are you suggesting that a position limited to less than
> 3 different amino acids is not "fairly specified"? Give me a break .
> . .

And you determine this degree of specificity by what...? And how many
of the amino acids in your numbers are even this constrained?

> > Actually, I haven't
> > seen you calculate anything. All I have seen you do is say that there
> > are positions that are very highly constrained, positions that are
> > somewhat less constrained, and positions that are even less constrained.
> > Then you wave your hands and come up with a number.
>
> The math is fairly easy here. Given the constraints listed, you can
> calculate the number yourself.

What constraints have you actually listed as being definitive of 'fairly
specified'? And how did you determine that amino acids meet your
constraints? Again, even if correct, the numbers are meaningless,
because they presume that evolution starts with a *random* sequence.

> You will find that the 10e60 number is
> actually being quite generous given these listed constraints
> (referenced by Yockey and others). If you think otherwise, do your
> own calculation and tell me your results.

And what relevance does Yockey's numbers have for how evolution actually
works? They may be useful for some analyses of abiogenesis of
activities. But not for their subsequent evolution.


>
> > Again, it is well known that all *modern* cytochrome c's have a high
> > percentage of evolutionarily constrained sequences.
>
> You mean *functionally* constrained sequences.

Same thing. The constrained sequences are constrained due to their
functional relevance. That doesn't mean that they are invariant, of
course. The differences in sequence occur to the extent that they are
not functionally constrained and occur in a historical pattern (the
nested hierarchy) rather than in a pattern related to different
environmental or phenotypic demands (which would not be in a nested
hierarchy related to descent -- time since divergence).


>
> > That is because
> > cytochrome c is small and most of its amino acids are in contact with
> > the substrate. The high degree of evolutionary constraint is what makes
> > this sequence particularly useful for analyzing deep phylogeny.
>
> Functional constraints really aren't useful for studying actual
> evolutionary relationships. The differences are different because of
> different needs for different levels of a particular type of function
> because of different environmental and phenotypic demands on different
> organisms.

Can you give an example of this putative correlation between the
"environmental or phenotypic demands on different organisms" and the
differences seen? Is the cytochrome c of whales more closely related to
that of seals, walruses, fish, manatees than to deer and antelope or
hippopotomus? Is the cytochrome c of birds closer to bats than to
crocodiles? Or is the degree of difference more highly correlated with
time since divergence rather than "environmental or phenotypic demands"?

> Such functional differences in very different organisms
> may have always been there by design.

Only if the design purpose were to mimic an apparently historical
pattern of common descent consistent with the one derived from the
fossil record.

> Again, the only way to rule out
> design as the only logical explanation for such differences is to show
> that mindless evolutionary processes can also explain the differences

> beynd the lowest levels of informational complexity.

There is no logical way to rule out design, per se, until or unless you
specify what the design was intended to do and what design pattern one
would expect. To the extent you have attempted to do this (immediately
above, where you propose that differences are a consequence of design
for environmental specificity or phenotypic similarity) the design
hypothesis has failed. To the extent that one proposes that the design
pattern is similar to the type of nested hierarchy one expects from
common descent, of course, it will succeed.


>
> > The
> > same is true for histones, for much the same reason. But you typically,
> > and misleadingly, apply these percentages of evolutionary constraint to
> > large molecules that show a much, much lower amount of evolutionary
> > constraint.
>
> I do not apply these percentages to much larger single molecules. I
> have repeatedly said that larger single proteins often have far lower
> constraints than do smaller protein functions. Cytochrome c and
> histone proteins are very highly constrained - much more so than the
> larger lactase enzyme and other such larger enzymes and proteins. I
> use the smaller proteins as examples because their level of constraint
> is well-known and clearly documented. I use them as illustrations to
> show that increased constraint results in an equivalent reduction of
> density in sequence space. Larger proteins, though less constrained,
> may still be quite rare due to their minimum sequence size
> requirement, which is a different type of constraint. Though less
> constrained than a cytochrome protein, a lactase function requires
> over 4 times as many amino acids at minimum.

And yet you assert (without evidence) that roughly half of the amino
acids in lactase (480) are 'fairly constrained'.

> Still, the clincher
> comes when you start considering multiprotein systems were each of the
> smaller individual proteins have a fairly high level of amino acid
> specificity/constraint. Since all of these proteins are required to
> work together at the same time, their combined number of fairly highly
> constrained amino acids starts to really add up - into the multiple
> thousands of fairly specified amino acids working together at the same
> time. For the flagellar system I would say that this number is well
> over 5,000aa.
> '
> Again, if you disagree with this number, prove me wrong. Tell me what
> you think the minimum genetic real estate is to code for a flagellar
> motility system.

I think any such number is utterly useless wrt any realistic analysis of
how selection works to produce modified function because evolution does
not work with a random sequence as its starting point and no
intermediate states of functional utility.

And the *average* or *random* density of functional sequences is
irrelevant to evolution, which always procedes from a non-random
sequence. Completely irrelevant.

> And, if they were completely irrelevant
> you wouldn't work so hard at arguing against them and what they
> obviously mean.

I am merely trying to figure out how you came up with the numbers you
did. I still don't know. All I see is hand-waving and, suddenly, out
of thin air, a number appears. That the numbers are irrelevant to how
evolution works since they assume that evolution works by starting with
a *random* sequence and proceding via a *random* walk with no selectable
intermediates is a second question.


>
> > The *only* model of evolution that you can use
> > these numbers to test is the model your math tells us you are testing.
> > Specifically, these numbers can tell us only the odds of the modern
> > system evolving from the starting point of a random sequence by a
> > process of a complete random walk with no possible intermediate states
> > of utility. Since no one proposes that any *modern* protein or system,
> > with rare exception like the nylonase case, evolves this way, your
> > numbers are irrelevant *even if* they are correct.
>
> Ok, how else do new types of functions evolve? Explain it to me
> again.

By modification of pre-existing function and/or structure in the
organisms in which such a pre-existing function and/or structure is present.

> It seems to me that to get a new type of function you must
> evolve new sequences that actually have new types of beneficial
> functions.

One evolves changed or modified pre-existing sequences such that an ebg
gene now binds and hydrolyzes galactose-based glycosides rather than
some other glycoside linkage. One evolves a single protein by modifying
a linkage of a pre-existing motor system to interact with a TTSS system
to create a modified TTSS system with some selectable level of rotory
motility. One evolves a visual system by modifying a cGMP protein that
interacts with and is activated by light striking a cis-retinol/opsin so
that it activates a pre-existing biochemical pathway that affects
electrical impulses across membranes. Invertebrates and vertebrates
independently chose different pre-existing pathways. In none of these
cases is there any 'new' sequence being created from some *random* sequence.

> Higher levels of functional complexity had to involve the
> previous evolution of steppingstone functions, each of which was
> selectably advantageous. If these steppingstones are not there, then
> evolution is impossible. If they are there, then evolution is not
> only possible, but easy.

The 'stepping stones' (i.e., the intermediate states of functional
utility) need not be selected for the teleologic end function (indeed
for longer pathways, it is often axiomatic that the end function will
not be the intermediate function). That is, directly selecting for the
end function may be futile unless one already has an intermediate that
was selected for some quite different function. For example, selecting
for motility by virtue of a rotory flagella may be quite useless until
or unless one has a flagellar structure selected for entirely different
reasons (such as protein export). Because these intermediate states
have *independent functional utility* there can be very long periods in
which evolution does not proceed at all. So the 'easiness' of something
evolving is not a function of its overall complexity at all. It all
depends upon what the starting point is. In a bacteria with an ebg
protein, which is there for reasons unrelated to lactase activity (since
it has none), evolving lactase activity is, indeed, "easy". In a
bacteria without an ebg protein, but only glycoside hydrolases which are
vital and cannot be changed to lactase without leading to death,
evolving a lactase may require a *random* duplication event as a
precursor event. With that event, evolving a lactase may also,
subsequently, be just as "easy" as evolving lactase from ebg. But
knowing the density of sequences with lactase activity among all
sequence space (the ratio of sequences with selectable lactase activity
divided by all possible sequences) would tell us precisely nothing about
the ease or difficulty of lactase activity evolving. It would be
useless information.

> Do you know another way?

I know for certain that your method starting with a *random* sequence
and proceding by a *random* walk with no useful intermediates is
definitely *not* the one that evolution uses. That makes all those
numbers you use irrelevant.


>
> > > It is quite obvious that the cytochrome c type of function is rather
> > > constrained by both its minimum amino acid requirement
> >
> > You *still* haven't told me how you calculated "minimum amino acid
> > requirement" besides say that some positions in cytochrome c are
> > strongly conserved, others less conserved, and others still less. Then
> > we get a wave of your hand and a number.
>
> Again, the minimum amino acid requirement is different from the
> minimum level of constraint.

And you haven't told me how to calculate either one. Or even estimate it.

> The minimum amino acid requirement is
> not a calculated number, but is estimated based on the shortest
> sequence found in a living thing having this type of function.

Then it shouldn't be called the 'minimum amino acid requirement'. To
call it that implies that you actually have demonstrated that the number
is a *minimum* number in an absolute sense. It should be called the
'shortest *known* sequence with lactase (or whatever) activity'. And,
since it is directly based on a real number (the shortest known sequence
having this activity), you certainly are not (or shouldn't be)
"estimating" the number. Basically, unless you *know* that the
"smallest known sequence" is the smallest *possbile* sequence with that
function, you are not justified in calling it the minimum amino acid requirement.



> > > as well as the
> > > fairly high degree of specificity required by the sequencing of these
> > > amino acids. Well over 60% of this protein is restrained to within 3
> > > amino acid options out of the 20 that are possible.
> >
> > The average time for a 100% probability of replacement of a single
> > selectively neutral amino acid is 100 million years, Sean.
>
> Not at all. Taking a mutation rate of 1e-6 mutations/generation, a
> population of just 10 billion would realize such a "neutral" mutation
> in just one generation in many of its individuals.

I didn't say that a "neutral" mutation would not occur more frequently.
I was talking about neutral *replacement* of the ancestral amino acid in
the wild-type sequence in the population. That is why I said
*replacement* of a selectively neutral amino acid.

> Of course, many of
> the variations in such proteins as cytochrome c are not neutral, but
> are functionally beneficial and maintained as such by natural
> selection (see below).

Where is your evidence of this? Especially the evidence for a
relationship between environmental conditions and sequence that is
stronger than time since divergence and sequence.


>
> > If two of
> > the sequences you examine were eutherian mammals (which separated some
> > 60 million years ago), that would mean that 40% of their sequence would
> > be identical *EVEN IF* every amino acid in their cytochrome c were
> > selectively neutral and free to drift.
> >
> > And it would take around 300
> > million years for all the possible amino acids due to neutral selection
> > to be reached from an ancestral sequence *by chance alone* (selection,
> > of course, works much, much faster -- indeed, in certain environments
> > one generation is enough time) because you would need mutation at more
> > than one site in a codon. Have you taken this into account at all? I
> > sure can't tell, because all you do is make a series of statements and
> > then wave your hand to come up with a number. Not that it really
> > matters, since even if the number were correct, it would be irrelevant.
>
> Consider that the average mutation rate for a given gene in all
> creatures, is about 1 x 1e-6 mutations per gene per generation. That
> means that a given gene will mutate only one time in one million
> generations on average.

I prefer the rate of mutation at a single nucleotide, which, on average,
is 10^-9 or 10^-8 (with a high variance). One such mutation will occur
every generation if the populations size is 10^9. And the rate of
neutral replacement for a selectively neutral site will be about 10^-9
per generation. This results in a rate of amino acid replacement of 1%
per million years at selectively neutral sites (as estimated by the rate
of change in the fibrinogen peptide sequence).

> Consider that single celled organisms have a
> much shorter generation time than multi-celled organisms on average.
> For example, the bacteria E. coli have a minimum generation time of 20
> minutes compared to the generation time of humans of around 20 years.

The generation time of E. coli is only 20 min under optimal conditions,
such as a rich media. And during the 20 years, the cells that form the
sperm (eggs are a bit different) have been dividing for at least 20-40
generations. And E. coli (and many other bacteria) have more efficient
repair mechanisms. Which value do you use? Point is, there is a lot
less difference than you might think by a superficial analysis.

> With a gene being mutated every 1 to 10 million generations in E.
> coli, one might think this would be a long time. However, each and
> every gene in an E. coli lineage will get mutated once every 40 to 80
> years. So, in one million years, each gene will have suffered at
> least 10,000 mutations. Also consider that the population of single
> celled organisms on earth is a lot higher than the populations of
> multicelled organisms. For example, there are almost 6 billion people
> living on earth today but more than 100 billion E. coli living inside
> just one person's intestines.

Is there a point here?
>
[snip math wrt cytochrome c which will be discussed separately]


>
> > > That is a
> > > significant constraint wouldn't you say? In fact the most generous
> > > estimates of the total number of possible cytochrome c sequences in
> > > sequence space that I have come across, based on these constraints,
> > > suggest no more than 10e60 cytochrome c sequences exist. If you know
> > > something different than I have suggested here, please do show me
> > > otherwise.
> >
> > That certainly is a large number of potential different cytochrome c
> > sequences (that is, sequences with quite significant cytochrome c
> > activity). And the question is, of what possible relevance is that
> > knowledge to what you are saying wrt the impossibility of evolution?
> > Unless, of course, your logic is that cytochrome c sequences arise by a
> > completely random process from a random sequence?
>
> This number has to do with the density of sequences with this type of
> function in sequence space. Knowing the density of beneficial
> functions in the only way to understand the potential and limits of
> evolutionary processes since evolution works via a crossing of
> functionally beneficial steppingstones to new types of functions. If
> the steppingstones aren't close enough, evolution doesn't happen.

But that is *NOT* how you have been using these numbers. You merely say
that evolution is impossible because of the density of sequences with
the final teleologic function in all of sequence space. No mention of
stepping stones.


>
> > > Now you ask, "What value do I give such numbers?" What does this
> > > estimate mean? Given that the total number of possible sequences in
> > > sequence space requiring a minimum of 80 amino acids is well over
> > > 10e100, the ratio of cytochrome c sequences to non-cytochrome c
> > > sequences is less than 1 in 10e40 sequences.
> >
> > SFW? You keep coming up with this bogus ratio that presumes that
> > cytochrome c, or whatever protein or system one is talking about, is
> > derived by starting from a random sequence and generating the final
> > result by a completely random walk with no possible states of
> > intermediate utility. And then you turn around and deny that that is
> > what you are saying.
>
> Not at all. There could be intermediate stepping stone functions
> between the original starting point and a cytochrome c function.

We will never know from your analysis.

> However, the odds that these steppingstones are close enough to a
> cytochrome c sequence or any other beneficial sequence within that
> level of complexity is determined through an understanding of
> functional densities of sequences within sequence space.

I would guess that all the proteins that bind heme in the same way that
cytochrome c does are pretty close together in sequence space. And I
would guess that there are only a relatively few amino acids that are
actually *required* to *bind* the heme. And I would hazzard a guess
that once you have a protein that binds heme, you might have selection
*for* any changes that restrict electron transport to a particular edge
by preventing face to face interaction between hemes. Pretty soon, but
hardly immediately by one mutation, you would have a protein that bound
heme and also covered the faces, thus directing electron transfer to the
desired edges. Which step in this processs cannot be an improvement
over the previous state?

> Although
> highly specified, the cytochrome c function does not require all that
> many amino acids at minimum. It's function is certainly within
> striking distance of what I suspect most "original" genomes would have
> to begin with. However, going very far above this level of specified
> complexity becomes a really big problem really fast.

How is "level of specified complexity" calculated?


>
> > > This translates into an
> > > average gap of 30 amino acid changes between various islands of
> > > cytochrome c sequence clusters within sequence space.
> >
> > And how is this "average gap" calculated from those two numbers? Be
> > precise.
>
> This isn't higher math here. If the sequence space gap that must be
> crossed is 10e40 then that works out to be a bit over 20^30, or
> slightly over 30 specified amino acid positional changes on average.

Assuming that every one of those positions require an *invariant* amino
acid different from the current one. How many invariant positions in
cytochrome c are there, again? And isn't that sequence space gap of
10e40 based on the idea that one is starting from a *random* sequence
and proceding by a *random* walk with no selectable activity until
*every* position has just the one right absolutely required amino acid.


>
> > And is that or is that not the "average gap" between a
> > *random* protein or a *random* sequence and some sequence that has
> > cytochrome c activity, just as all your other calculations are based on
> > putative gaps between *random* proteins and *random* sequences and some
> > teleologically determined end point with the proviso that only a random
> > walk with no functional intermediates is possbile between the two states.
> >
> > Of course, as my little exercise with ebg shows, evolution does not work
> > by starting with a *random* protein or a *random* sequence. And all
> > that matters is the number of mutational steps required to generate a
> > selectable activity even if that activity is not the end activity in
> > question. IOW, your numbers are totally irrelevant. Not *even* wrong.
> > But as useful as knowing the distance between earth and the furthest
> > galaxy is to determining how one gets from L.A. to San Francisco.
> > Utterly irrelevant.
>
> It is not utterly irrelevant. Don't you understand the concept yet?

Yes. GIGO is a pretty easy concept to understand.

> These estimates help determine the odds that L.A. will actually be as
> close to San Francisco as it is. If the average distance from one
> beneficial function to another is the size of the universe, that tells
> you not to put too much money on the bet that starting with L.A., that
> some new beneficial place like San Francisco, will be just a few
> hundred miles away.

Boy. You had to work at twisting that one. The fact is that knowing
the distance between earth and a distant galaxy is of absolutely no use
in determining the possibility or probability of getting between L.A.
and San Francisco. One certainly cannot use the same mechanism for
getting from the earth to the farthest galaxy; indeed it is probably
impossible to get from here to there by any mechanism. But evolution
only asks how you can get from L.A. to San Francisco, not how you can
move from one spot in the universe to some random other place in the
universe. The trip between L.A. and San Francisco is doable, just as
the distance between ebg and lactase is doable. The information about
the average distance between earth and some other random spot in the
universe would be irrelevant.

> That is what Las Vegas is all about - predicting
> the average amount of time it takes someone to win. If the average
> amount of time it takes for evolution to "win" a new type of function
> at a particular level of complexity is a trillion years, what does it
> matter if it happens to win tomorrow? In the end, the average is
> still trillions of years.

Do you actually know of some biologically relevant function that is that
far away from some other pre-existing biological function? If you do,
then *that* function probably has not evolved from *that* starting point
and is not currently seen on the earth.

> Just like gambling in Las Vegas, if you
> keep playing the evolution game too long, even if you have an early
> "win", you will eventually loose.

You would always win (and relatively quickly) the slots if you could
keep the positions you needed and just randomly change the remaining
ones. You could always (and relatively quickly) wind up with a royal
flush if you could just toss back the the 'bad' cards and keep the cards
you want. That is why they don't let you play that way in Las Vegas.
That is why they make you start from scratch (a random deck) each play.
Hell, they don't even allow counters in Black Jack. Evolution doesn't
play by Las Vegas rules. Evolution doesn't start with a randomly
shuffled deck after each hand and doesn't require you to toss in your
hand and start a new one each time.


>
> > > In other words,
> > > if a particular organism that did not have a cytochrome c function but
> > > would benefit from this type of function, its current proteins would
> > > differ from the nearest potential cytochrome c sequence by an average
> > > of over 30 amino acid positions.
> >
> > Evolution would not *start* with the average protein. It would start
> > with the extreme end of the bell-shaped distribution of proteins that is
> > only 1 or 2 amino acids away from a selectable function.
>
> And what are the odds of that? At higher and higher levels of
> complexity, the odds that a new type of beneficial function will only
> be "just 1 or 2 steps away" becomes more and more remote in an
> exponential fashion.

Only if one must start from scratch with no useful intermediates.
Otherwise, regardless of the final complexity of the system, each step
in its evolution may be no more difficult than 1 or 2 amino acids away.
But to answer the question in front of me, even in the silly
hypothetical case of a random bell-shaped distribution of sequences, if
the *average* starting protein out of two thousand (the human genome
would be 15 times larger at least) is only 30 positions away, that
necessarily means that about half of the proteins will be less than 30
positions away. The answer would depend on what the variance is around
the average. Let's say that the protein we need (one with fewer than,
say, five positions away) is two standard deviations away from the mean
or average protein. That means that roughly 2.5% of all proteins (or
about 50) would fit these criteria. The above is a silly exercise, of
course, because proteins in an organism are not distributed around a
mean or average protein. They are present as functional families.

> The average distance is an extremely important
> concept.

It is a concept of no relevance whatsoever wrt evolution. Evolution
does not modify any *average* or *random* protein to get a new function.

> It means that just because it happened once does not mean
> that it will happen again, if ever. The averages get so enormously
> small in short order that betting that evolution will succeed over
> time becomes an exercise in insanity.

What is insane is your math, which is NOT based on any known
evolutionary mechanism, but based on a straw man.


>
> That's all for now . . .
>
> Sean
> www.naturalselection.0catch.com

o

Howard Hershey

unread,
Jan 4, 2004, 1:06:25 PM1/4/04
to
>>Sean Pitman wrote:


> Now, cytochrome c phylogenies are generally based on analysis of
> certain subunits of cytochrome c which range in number of amino acids
> up to a maximum of about 600 or so.

Having taken a look at a sampling of the more than 200 cytochrome c's
(including many bacterial versions) currently available, I can say the following:

Go to
http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00034
and choose "Full Alignment" in the alignment section
Then click on "Get Alignment"

or try

http://www.sanger.ac.uk/Software/Pfam/data/jtml/full/PF00034.shtml

In no case that I saw is the cytochrome c *portion* of any protein
larger than about 100 aa itself much larger than 100aa. That is, all
sequence over about 70-100 is unrelated to cyt c (or a duplicate of it).
In every case, and almost entirely in bacterial sequences, the
cytochrome c sequence is sometimes embedded within a multifunctional
protein. Sometimes the multifunctional protein has other activities in
the amino end, in other cases the unrelated activities are at the
carboxy terminus. Sometimes the cyt c sequence is duplicated within
these larger peptides.

Shame on you, Sean, for not even following *your own rules* for
determining minimum sequence or functional sequence size!

For example, take the sequence of Pseudomonas stutzeri. It is nominally
695 aa long. However, the only portion that is homologous to other
cytochrome c's is between aa 607 and 682, meaning that the *relevant*
cyt c functional sequence is actually only 75 aa long. And even that
sequence has variance compared to other cyt c sequences.

Such formation of multifunctional proteins is not unusual in bacteria.

The smallest cyt c relevant sequence that I found (Psudomonas putida --
interestingly in the same genus as the 695 aa 'cyt c' of P. stutzeri)
with a *cursory* look came from a sequence of 107 aa, of which only
sequences 38-107 (the carboxy terminus) were cyt c related sequences
(the first 31 were signal sequences or sequences of a different class of
proteins). That puts the minimum aa sequence at 69, not all of which
are probably 'fairly specified' (whenever Sean gets around to defining
that term). Indeed, in many of the sequences, the variation in sequence
is mostly in the carboxy and amino terminal regions, but there are a
class of variations in an internal region as well.

Not surprizingly, there is a lot less variation in number or sequence
within the eucaryotes (most sequences there are in the range of 100 aa,
especially for the vertebrates).

> This would translate into a
> minimum of at least 1,800 nucleic acids in DNA coding for this subunit
> of cytochrome c protein.

Note how Sean somehow morphs into treating the *largest* protein that
has cytochrome c activity (even though that is only found in a 75 aa
stretch within that protein) as if it were composed of all cytochrome c
functional protein.

> Note that the tetrahymena species are about
> 50% different from all other creatures. It seems then that all the
> creatures would have experienced at least a 25% change in their
> genetic codes from the time of common ancestor. So how many
> generations would it take to achieve this 25% difference?

You must have been getting your analysis from some place with a decided
eucaryotic bias. Now, admitedly, cyt c did appear in eucaryotes by the
event that generated mitochondria. That is, by an event of horizontal
transfer. That means that any analysis of pattern might well have all
eucaryotes clustering within the bacteria sequences, depending upon
there being some bacterial lineages with current living species who have
a last common ancestor that diverged earlier than the horizontal event
that produced eucaryotic mitochondria.



> Taking 25% of 1,800 give us 450 mutations. Lets say that the average
> mutation rate is one mutation per 1,800 nucleic acids per one million
> generations. For a steady state population of just one individual in
> each generation it would take about 450 million generations to get a
> 25% difference from the common ancestor.

A population of ONE! Must have a hard time f**king itself to produce
its single progeny in the next generation. To you understand the
meaning of GIGO?

> With a generation time of 20
> minutes (ie: E. coli),

What is the generation time of E. coli in nature? Hint: Not 20 minutes.
What is the relevant generation time of human cells? Hint: Not 20
years. And how long have humans with generation times of 20 years existed?

> that works out to be about 342,000 years.
> However, with a steady state population of say a trillion trillion
> individuals (the total number of bacteria on earth is somewhere around
> five million trillion trillion or 5 with 30 zeros following), one
> might expect that the number of generations required to get a 25%
> difference would be a bit less.

According to the rules, the probability of *fixation* of a neutral
mutation in a population is independent of population size. It is
solely dependent on the mutation rate. Thus not only is your
calculation flawed, it demonstrates that you do not understand neutral
fixation. The *observed* rate for fixation of a neutral aa mutation is
about 1% per million years (estimated from the rate of aa substitution
in fibrinogen peptide). This would be approximately true for E. coli
and for H. sapiens. This alone means that all your subsequent
hypotheses that there should be bacteria with human cyt c sequences nonsense.

> So, for bacteria, the 25% difference
> from the common ancestor cytochrome c, might have been achieved
> relatively rapidly given the evolutionary time frame (a couple hundred
> thousand years or so).
>
> The question is then, if bacteria can achieve such relatively rapid
> neutral genetic drift, why are they not more wide ranging in their
> cytochrome c sequences?

Because of evolutionary constraint (aka selection), which is
particularly strong when a large fraction of the amino acids are
involved in contact with the substrate. However, this strong selection
bias is a two-edged sword. It makes the evolution *of* a functional
sequence from a less useful one much more rapid (and certainly much more
rapid than the subsequent neutral fixations), so long as each step in
the process (each change) has a selectively significant effect. The
stronger the selective effect, the more rapid the fixation.

> It seems that if these cytochrome c sequence
> differences were really neutral differences, that various bacterial
> groups, colonies, and species, would cover the entire range of
> possible cytochrome c sequences to include that of mammals.

There are degrees of constraint, from a position which must be invariant
to a position which allows all possible change. Moreover, there can be
interactions between amino acid postions, whereby a change in one site
can allow a change in a different site that would be selected against.
And there are even changes, such as internal factor-of-three nucleotide
deletions/additions, additions to either the carboxy or amino terminus,
or even duplications or the additions of entirely different sequence
moities to make large complex proteins that can occur without affecting
cyt c function.

But why would you expect bacterial sequences to cover the entire range
of mammals if there is evolutionary constraint? What I would expect is
that all eucaryotic sequences should nest (if all eucaryotic sequences
are due to a single horizontal event, of course) and should nest within
the bacterial sequences appropriately depending upon the source of
mitochondria. What would you expect?

> Why are
> they then so uniformly separated from all other "higher" species
> unless the cytochrome sequences are functionally based and therefore
> statically different due to the various needs of creatures that

> inhabit different environments?

I am not clear what you mean here. Are you asking "why do mammalian cyt
c's form a nested series branching from other multi-cellular organisms"
or are you talking about where bacteria fit in?


>
> For example, bacteria are thought to share a common ancestor with
> creatures as diverse as snails, sponges, and fish dating all the way
> back to the Cambrian period some 600 million years ago. All of these
> creatures are thought to have been around quite a long time - ever
> since the "Cambrian Explosion." In fact, they have all been around
> long enough and are diverse enough to exhibit quite a range in
> cytochrome c variation. Why then are their cytochrome c sequences so
> clustered? Why don't bacteria, snails, fish, and sponges cover the
> range of cytochrome c sequence variation if these variation
> possibilities are in fact neutral?

Because cytochrome c sequences are not all selectively neutral. There
is a wide range of constraint, from very little to invariant. The net
effect, in cytochrome c, is a relatively large degree of constraint,
with most of the variance limited to particular regions of the protein
or peptide sequence. Other proteins or peptides, especially larger
ones, show a much lower degree of constraint. The high degree of
constraint is what makes cyt c so useful for deep phylogeny. If cyt c
were, in fact, as selectively neutral as fibrinogen peptide is, it would
be useless for deep phylogeny. It would be useless after about 100
million years. If it were as constrained as hemoglobin, it would be
useless for analyzing the relationship of bacteria, but still useful for
analyzing relationships among vertebrates.

> In other words, why are there not
> at least some types of bacteria that share sequence identity with
> humans?

The two last shared a common ancestor quite some time ago. The reason
why humans and chimps share identical sequences is because they share a
recent common ancestor.


>
> I propose that the clustered differences that are seen in genes and
> protein sequences, such cytochrome c, are the result of differences in
> actual function that actually benefit the various organisms according
> to their individual needs.

Then you would be wrong. The nesting pattern of similarity is related
to time since divergence, not morphological or physiological or
environmental similarity. That is, humans and chimps will show more
identity, despite being morphologically dissimilar and living in
different environments -- such as Eskimos do. Frogs, which have been
separated in S. America and Africa, but are morphologically similar,
inhabit very similar environments, and are nearly identical
physiologically, but have been separated for 40 million years (as
opposed to 5 million for chimps and humans) will show more sequence
variation. Your hypothesis has been proposed, checked out, and
discarded. When creationism is not bogus non-science it is refuted science.

> If the differences were in fact neutral
> differences, there would be a vast overlap by now with complete
> blurring of species' cytochrome c boundaries - even between species as
> obviously different as humans and bacteria.

Anybody who has studied this seriously has come to a quite different
conclusion. Obviously, one certainly needs to worry about the
re-ratting problem when one has a highly constrained protein. That is,
since a particular aa site can only vary between 3 amino acids, for
example, it is entirely possible for such a site to have gone from an
ancestral amino acid to a lineage which has one of the other two
possibilities, and be in a branch off of that lineage which has reverted
back to the original aa. There are statistical mechanisms for
recognizing such re-rats. As long as a significant number of amino
acids are examinied and constraint is not absolute in all of them, it is
usually possible to identify such cases.

> Because of this, sequence
> differences may not be so much the result of differences due to random
> mutation over time as they are due to differences in the functional
> needs of different creatures. I think that the same can be said of
> most if not all phylogenies that are based on genotypic differences
> between creatures.

Again, this can be and has been tested. The nested pattern of
relationship generated by statistical methods of minimum parsimony, etc.
do not form a pattern that is a function of functional needs. It forms
a historical pattern related to time since divergence.

>
> In 1993, Patterson, Williams, and Humphries, scientists with the
> British Museum, reached the following conclusion in their review of
> the congruence between molecular and morphologic phylogenies:
>
> "As morphologists with high hopes of molecular systematics, we end
> this survey with our hopes dampened. Congruence between molecular
> phylogenies is as elusive as it is in morphology and as it is between
> molecules and morphology. . . . Partly because of morphology's long
> history, congruence between morphological phylogenies is the exception
> rather than the rule. With molecular phylogenies, all generated
> within the last couple of decades, the situation is little better.
> Many cases of incongruence between molecular phylogenies are
> documented above; and when a consensus of all trees within 1% of the
> shortest in a parsimony analysis is published structure or resolution
> tends to evaporate."

This was written a decade ago.
>
> http://naturalselection.0catch.com/Files/geneticphylogeny.html
> y

RobinGoodfellow

unread,
Jan 5, 2004, 2:18:29 AM1/5/04
to
RobinGoodfellow <lmuc...@yahoo.com> wrote in message

In case my previous correction didn't get through (looks like it
vanished into the murky depths of Google):

> Same "sequence space" as before, but now a sequence is "beneficial" if
> it is AAAAAAAAAA......AAA (all A's), or it differs from AAAAA...AAA by
> at least 2 amino acids.

This should say "*more than* 2 amino acids".

Sean Pitman

unread,
Jan 11, 2004, 5:29:37 PM1/11/04
to
Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF89BDA...@indiana.edu>...

> >>Sean Pitman wrote:
>
> > Now, cytochrome c phylogenies are generally based on analysis of
> > certain subunits of cytochrome c which range in number of amino acids
> > up to a maximum of about 600 or so.
>
> Having taken a look at a sampling of the more than 200 cytochrome c's
> (including many bacterial versions) currently available, I can say the following:
> In no case that I saw is the cytochrome c *portion* of any protein
> larger than about 100 aa itself much larger than 100aa. That is, all
> sequence over about 70-100 is unrelated to cyt c (or a duplicate of it).
> In every case, and almost entirely in bacterial sequences, the
> cytochrome c sequence is sometimes embedded within a multifunctional
> protein. Sometimes the multifunctional protein has other activities in
> the amino end, in other cases the unrelated activities are at the
> carboxy terminus. Sometimes the cyt c sequence is duplicated within
> these larger peptides.
>
> Shame on you, Sean, for not even following *your own rules* for
> determining minimum sequence or functional sequence size!

Don't you understand? In this situation it is to your advantage that
the length be as long as possible. I was making things as good as I
possibly could for your position. The shorter the protein sequence
the less helpful it will be as a phylogenetic marker of divergence,
even in theory, the farther back in time you try to go. The
"resolution" will get less and less to the point of uselessness more
quickly with shorter sequences. But, if you still want to use
sequences as short as "70-100aa" for phylogenetic analysis as you
suggest, that only helps my position out, not yours. We are not
talking "functional evolution" at this point, but phylogenetic
comparisons whose differences are suppose to correlate with times of
divergence from a common ancestor. It is your argument that "even if
every amino acid in the cytochrome c molecule were selectively neutral
and free to drift that it would take around 100 million years to
replace a single selectively neutral amino acid." That is your
position, correct?

Let's look at this assertion and see if it holds any water. Using
your suggested mutation rate of 1e-8 per base per generation and a
generation time for bacteria averaging say, six hours, how long would
a 100aa sequence maintain a usable resolution of all positions were
"neutral" with respect to change as you suggest? We would expect that
completely random sequences would share around a 25% identity with
each other, so a meaningful resolution would require more than 25%
sequence identity. So, lets say that out of the 300 nucleotides that
code for our 100aa protein that only 75 of them can change between
different lineages before our resolution becomes unusable. This of
course translates into a 25% change per lineage on average before the
resolution becomes useless. It would take 68,493 years to get one
mutation in this sequence at this mutation rate. Multiplying by 75
results in a maximum resolution of just 5 million years. Considering
that other life forms, to include those as diverse as snails, sponges,
and fish, were supposed to have diverged from bacteria some 600
million years ago, it only begs the question as to why the resolution
of such small proteins has not been lost many times over by now
between such creatures. I suggested that the reason for this is that
such differences are not neutral at all but are functionally
beneficial depending on the different needs of the different
creatures. As such, these differences are maintained over time by
natural selection and are not allowed to "drift" randomly. You
basically agree with this in your reply when you assert that these
changes are "selectable". If they are selectable and not neutral,
then how are you so sure that clustering supports your notion of
common decent over my notion of original design? Different clusters
are not so much the result of time since a common ancestor as they are
the result of selectably maintained functions as per similarities and
differences in functional needs of the individual types of creatures.
How do you get around this argument? Can you show that the
similarities and differences are not so much based on similarities and
differences in functional needs?

<snip>

> > The question is then, if bacteria can achieve such relatively rapid
> > neutral genetic drift, why are they not more wide ranging in their
> > cytochrome c sequences?
>
> Because of evolutionary constraint (aka selection), which is
> particularly strong when a large fraction of the amino acids are
> involved in contact with the substrate.

Exactly - And this is my whole point. The differences are not
necessarily so much the result of divergence over time as they are the
result of maintenance of beneficial variants over time.

> However, this strong selection
> bias is a two-edged sword. It makes the evolution *of* a functional
> sequence from a less useful one much more rapid (and certainly much more
> rapid than the subsequent neutral fixations), so long as each step in
> the process (each change) has a selectively significant effect. The
> stronger the selective effect, the more rapid the fixation.

Absolutely true. But, once the most useful sequence has been evolved
for the needs of a particular kind of creature, that sequence will be
maintained pretty much unchanged over time. This will create a
clustering effect for that type of creature as compared to other types
of creatures with different needs and therefore a cluster around a
different sequence that better suits the different needs of that type
of creature in its environment.

> > It seems that if these cytochrome c sequence
> > differences were really neutral differences, that various bacterial
> > groups, colonies, and species, would cover the entire range of
> > possible cytochrome c sequences to include that of mammals.
>
> There are degrees of constraint, from a position which must be invariant
> to a position which allows all possible change.

For cytochrome c, as we have already discussed at length, the degree
of constraint is very significant. After all, over 60% of the
position in this molecule seem to be constrained to within 3 different
amino acids out of the 20 that are possible. Most people would call
that sort of constraint very significant. So, the clustered
differences that are seen between different creatures are most
certainly functionally significant. Even you have admitted as much
here.



> But why would you expect bacterial sequences to cover the entire range
> of mammals if there is evolutionary constraint?

I wouldn't because obviously my main point is that there is
evolutionary constraint and that this functional constraint is what
creates the clustering of sequences according to variations in
functional needs of different organisms. Such constraints could
maintain such sequences pretty much indefinitely. There is absolutely
no reason then to assume that similarities and differences are the
result of divergence from common ancestry over the position that they
are the result of historically maintained differences in functional
needs.

> What I would expect is
> that all eucaryotic sequences should nest (if all eucaryotic sequences
> are due to a single horizontal event, of course) and should nest within
> the bacterial sequences appropriately depending upon the source of
> mitochondria. What would you expect?

I also would expect that they would nest based on similarities and
differences in functional needs.

> > Why are
> > they then so uniformly separated from all other "higher" species
> > unless the cytochrome sequences are functionally based and therefore
> > statically different due to the various needs of creatures that
> > inhabit different environments?
>
> I am not clear what you mean here. Are you asking "why do mammalian cyt
> c's form a nested series branching from other multi-cellular organisms"
> or are you talking about where bacteria fit in?

I'm talking about both since both scenarios would have been completely
obliterated in the time that is supposed to have occurred since their
common divergence if the changes were truly neutral. If they weren't
neutral, then my argument stand that these differences could have
always been there and do not necessarily necessitate an interpretation
of common ancestry.

> > For example, bacteria are thought to share a common ancestor with
> > creatures as diverse as snails, sponges, and fish dating all the way
> > back to the Cambrian period some 600 million years ago. All of these
> > creatures are thought to have been around quite a long time - ever
> > since the "Cambrian Explosion." In fact, they have all been around
> > long enough and are diverse enough to exhibit quite a range in
> > cytochrome c variation. Why then are their cytochrome c sequences so
> > clustered? Why don't bacteria, snails, fish, and sponges cover the
> > range of cytochrome c sequence variation if these variation
> > possibilities are in fact neutral?
>
> Because cytochrome c sequences are not all selectively neutral.

Again, you hit the nail on the head. That is my whole point. These
differences are not selectively neutral at all.

> There
> is a wide range of constraint, from very little to invariant.

The range may be wide, but the limits are significant none-the-less,
as previously noted.

> The net
> effect, in cytochrome c, is a relatively large degree of constraint,
> with most of the variance limited to particular regions of the protein
> or peptide sequence.

Exactly. And these regions are so extensive as to limit over 60% of
the amino acid positions to within 3 amino acids.

> Other proteins or peptides, especially larger
> ones, show a much lower degree of constraint.

True.

> The high degree of
> constraint is what makes cyt c so useful for deep phylogeny.

This is an unfounded assumption based on the notion that the theory of
evolution is already true. By itself, this idea cannot stand since
such nested hierarchies could also be just as easily explained by the
maintenance over time of original differences that are functionally
advantageous for different types of creatures in various environments.

> If cyt c
> were, in fact, as selectively neutral as fibrinogen peptide is, it would
> be useless for deep phylogeny.

Now why is this? You said, and I quote, that it would take, "around
100 million years to replace a single selectively neutral amino acid."
Did you mistype here?

> It would be useless after about 100
> million years.

Not if you were correct in your above quoted statement. You must
either be wrong here or there. Which is it? The only reason why it
would be useless after far less than 100 million years (say 5 or 10
million at the most) is that selectively neutral positions drift at a
very rapid rate consistent with my own calculations that you said were
wrong.

> If it were as constrained as hemoglobin, it would be
> useless for analyzing the relationship of bacteria, but still useful for
> analyzing relationships among vertebrates.

And yet hemaglobins are still clustered in widely divergent creatures
despite the lower level of constraint.

> > I propose that the clustered differences that are seen in genes and
> > protein sequences, such cytochrome c, are the result of differences in
> > actual function that actually benefit the various organisms according
> > to their individual needs.
>
> Then you would be wrong. The nesting pattern of similarity is related
> to time since divergence, not morphological or physiological or
> environmental similarity.

How do you know? It is possible, is it not, for natural selection to
maintain functional differences that are beneficial because of their
differences in different creatures?

> That is, humans and chimps will show more
> identity, despite being morphologically dissimilar and living in
> different environments -- such as Eskimos do.

Humans and chimps are not all that dissimilar in morphology or
environmental habitats. They are in fact quite similar in functional
needs.

> Frogs, which have been
> separated in S. America and Africa, but are morphologically similar,
> inhabit very similar environments, and are nearly identical
> physiologically, but have been separated for 40 million years (as
> opposed to 5 million for chimps and humans) will show more sequence
> variation.

It depends upon what sequences are involved. Are these cytochrome c
sequences or other functional genetic sequences, or are they
comparisons between neutral non-functional genetic sequences.

> Your hypothesis has been proposed, checked out, and
> discarded. When creationism is not bogus non-science it is refuted science.

Oh really?! Where is your reference for this notion that similar
creatures in different environments have significant differences that
are not functionally based as part of a functional gene? Haven't you
heard that different modern races of humans who live in different
environments have functionally different genes with the same type of
function and that these differences are maintained because they are
beneficially different in different environments?

"People whose ancestors hail from chilly regions have gene adaptations
that may allow their bodies to produce more heat while burning
calories, an international team of researchers reports, while those
with roots in warmer climates use calories more efficiently and
produce scant excess heat. Mitochondria, found in every cell, are
responsible for producing energy and play a key role in regulating
metabolism. The DNA in mitochondria is inherited maternally, and shows
"striking differences" from one geographic region to another, Douglas
C. Wallace from the University of California at Irvine and colleagues
note. To investigate whether adaptation to different climates might
explain this variation, the researchers analyzed gene sequences from
the mitochondria of 104 people who represented all the known major
types of mitochondrial DNA. In the Proceedings of the National Academy
of Sciences paper, Wallace and his team report that mitochondrial gene
variants may confer advantages in some climates, but disadvantages in
others. For instance, arctic and sub-arctic native peoples had
variants that programmed them to produce more heat, but put out less
energy. Their bodies were thus more efficient at keeping them warm.
Previous studies have shown that indigenous populations around the
North and South Poles tend to have a higher resting metabolism.
Tropical and sub-tropical natives made more efficient use of energy,
producing little heat."

Proceedings of the National Academy of Sciences
2002;10.1073/pnas.0136972100
http://www.healthexcel.com/docs/_native1.html

> > If the differences were in fact neutral
> > differences, there would be a vast overlap by now with complete
> > blurring of species' cytochrome c boundaries - even between species as
> > obviously different as humans and bacteria.
>
> Anybody who has studied this seriously has come to a quite different
> conclusion.

Based on what?

> Obviously, one certainly needs to worry about the
> re-ratting problem when one has a highly constrained protein. That is,
> since a particular aa site can only vary between 3 amino acids, for
> example, it is entirely possible for such a site to have gone from an
> ancestral amino acid to a lineage which has one of the other two
> possibilities, and be in a branch off of that lineage which has reverted
> back to the original aa. There are statistical mechanisms for
> recognizing such re-rats. As long as a significant number of amino
> acids are examinied and constraint is not absolute in all of them, it is
> usually possible to identify such cases.

Oh really? And how do you tell the difference between functional
maintenance and neutral changes that would have been "re-rated"
thousands of times by now if they were in fact neutral?

> > Because of this, sequence
> > differences may not be so much the result of differences due to random
> > mutation over time as they are due to differences in the functional
> > needs of different creatures. I think that the same can be said of
> > most if not all phylogenies that are based on genotypic differences
> > between creatures.
>
> Again, this can be and has been tested. The nested pattern of
> relationship generated by statistical methods of minimum parsimony, etc.
> do not form a pattern that is a function of functional needs. It forms
> a historical pattern related to time since divergence.

And how is this determined? How do you know that these trees are not
based on functional needs? I've never seen this explained in a way in
which I could understand it.

> > In 1993, Patterson, Williams, and Humphries, scientists with the
> > British Museum, reached the following conclusion in their review of
> > the congruence between molecular and morphologic phylogenies:
> >
> > "As morphologists with high hopes of molecular systematics, we end
> > this survey with our hopes dampened. Congruence between molecular
> > phylogenies is as elusive as it is in morphology and as it is between
> > molecules and morphology. . . . Partly because of morphology's long
> > history, congruence between morphological phylogenies is the exception
> > rather than the rule. With molecular phylogenies, all generated
> > within the last couple of decades, the situation is little better.
> > Many cases of incongruence between molecular phylogenies are
> > documented above; and when a consensus of all trees within 1% of the
> > shortest in a parsimony analysis is published structure or resolution
> > tends to evaporate."
>
> This was written a decade ago.

Do you know of anything better after 10 years that adequately refutes
Patterson et. al.?

Sean
www.naturalselection.0catch.com

Bill Rogers

unread,
Jan 12, 2004, 7:36:45 AM1/12/04
to
seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.04011...@posting.google.com>...

> Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF89BDA...@indiana.edu>...
> > >>Sean Pitman wrote:
>
<snip points others will doubtless cover>
> > > In 1993, Patterson, Williams, and Humphries, scientists with the
> > > British Museum, reached the following conclusion in their review of
> > > the congruence between molecular and morphologic phylogenies:
> > >
> > > "As morphologists with high hopes of molecular systematics, we end
> > > this survey with our hopes dampened. Congruence between molecular
> > > phylogenies is as elusive as it is in morphology and as it is between
> > > molecules and morphology. . . . Partly because of morphology's long
> > > history, congruence between morphological phylogenies is the exception
> > > rather than the rule. With molecular phylogenies, all generated
> > > within the last couple of decades, the situation is little better.
> > > Many cases of incongruence between molecular phylogenies are
> > > documented above; and when a consensus of all trees within 1% of the
> > > shortest in a parsimony analysis is published structure or resolution
> > > tends to evaporate."
> >
> > This was written a decade ago.
>
> Do you know of anything better after 10 years that adequately refutes
> Patterson et. al.?

No. Not much has changed in 10 years in that regard. But you have to
see that quote in context. If the expectation is that you should have
a perfect match between the different phylogenies, then, sure that
expectation is disappointed. In a particular context you might even
say "there are great discrepancies between morphological and molecular
phylogenies." Fine. But if the expectation is the null hypothesis,
that molecular and morphological phylogenies should be completely
random with respect to each other, then that expectation is
disappointed massively. The correlation between morphological and
molecular phylogenies is extremely strong, when compared to the null
hypothesis.


>
> Sean
> www.naturalselection.0catch.com

howard hershey

unread,
Jan 12, 2004, 2:13:20 PM1/12/04
to

Sean Pitman wrote:
> Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF89BDA...@indiana.edu>...
>
>>>>Sean Pitman wrote:
>>
>>>Now, cytochrome c phylogenies are generally based on analysis of
>>>certain subunits of cytochrome c which range in number of amino acids
>>>up to a maximum of about 600 or so.
>>
>>Having taken a look at a sampling of the more than 200 cytochrome c's
>>(including many bacterial versions) currently available, I can say the following:
>>In no case that I saw is the cytochrome c *portion* of any protein
>>larger than about 100 aa itself much larger than 100aa. That is, all
>>sequence over about 70-100 is unrelated to cyt c (or a duplicate of it).
>> In every case, and almost entirely in bacterial sequences, the
>>cytochrome c sequence is sometimes embedded within a multifunctional
>>protein. Sometimes the multifunctional protein has other activities in
>>the amino end, in other cases the unrelated activities are at the
>>carboxy terminus. Sometimes the cyt c sequence is duplicated within
>>these larger peptides.
>>
>>Shame on you, Sean, for not even following *your own rules* for
>>determining minimum sequence or functional sequence size!
>
>
> Don't you understand? In this situation it is to your advantage that
> the length be as long as possible. I was making things as good as I
> possibly could for your position.


What a load of crap, Sean. And you know it. Although you have *still*
not yet specified exactly how one goes about determining values for the
minimum amino acid number or the minimum required amino acid number
(other than wave your hands and toss out a number), the only thing that
is clear from your pseudocalculations is that the *larger* either number
is, the more improbable it is that a system could evolve. This idea
that the larger the size (and number?) of the proteins involved in a
system, the more impossible it is for it to arise from a random sequence
by a random walk runs throughout *every* example you give. And now you
are saying the exact opposite? Can't you hold still for a second? How
do you expect me to reach the goal line when you keep switching sides?

> The shorter the protein sequence
> the less helpful it will be as a phylogenetic marker of divergence,
> even in theory, the farther back in time you try to go.

That depends upon the evolutionary constraint on sequence. As I have
repeatedly said, the *reason* why cytochrome c is useful as a
phylogenetic marker for deep phylogenies is because it is under strong
evolutionary constraint, unlike, say, the fibrinogen peptide sequence.

> The
> "resolution" will get less and less to the point of uselessness more
> quickly with shorter sequences.

The size of the sequence is irrelevant (other than that proteins that
show significant constraint are much more likely to be small). The
amount of evolutionary constraint on the sequence is not a direct
function of size. Fibrinogen peptide is actually smaller than
cytochrome c and shows essentially no constraint. Hemoglobin's globins
are larger. Both show less constraint than cytochrome c. Histones H3
and H4 are larger (but not much larger, because for there to be
significant constraint it is important that most of the protein be in
contact with the substrate and that requires a small protein) than the
minimum cytochrome c and shows more constraint.

> But, if you still want to use
> sequences as short as "70-100aa" for phylogenetic analysis as you
> suggest, that only helps my position out, not yours. We are not
> talking "functional evolution" at this point, but phylogenetic
> comparisons whose differences are suppose to correlate with times of
> divergence from a common ancestor.

Yes. And generally does.

> It is your argument that "even if
> every amino acid in the cytochrome c molecule were selectively neutral
> and free to drift that it would take around 100 million years to
> replace a single selectively neutral amino acid." That is your
> position, correct?
>
> Let's look at this assertion and see if it holds any water. Using
> your suggested mutation rate of 1e-8 per base per generation and a
> generation time for bacteria averaging say, six hours, how long would
> a 100aa sequence maintain a usable resolution of all positions were
> "neutral" with respect to change as you suggest? We would expect that
> completely random sequences would share around a 25% identity with
> each other, so a meaningful resolution would require more than 25%
> sequence identity. So, lets say that out of the 300 nucleotides that
> code for our 100aa protein that only 75 of them can change between
> different lineages before our resolution becomes unusable. This of
> course translates into a 25% change per lineage on average before the
> resolution becomes useless. It would take 68,493 years to get one
> mutation in this sequence at this mutation rate. Multiplying by 75
> results in a maximum resolution of just 5 million years. Considering
> that other life forms, to include those as diverse as snails, sponges,
> and fish, were supposed to have diverged from bacteria some 600
> million years ago, it only begs the question as to why the resolution
> of such small proteins has not been lost many times over by now
> between such creatures.

Rather than try to refute your pseudocalculations, I will only say that
real-world evidence always trumps hypothetical calculations. And if you
plot the rate of amino acid substitutions as "amino acid changes per 100
residues" against "millions of years since divergence" based on the
fossil record, you get reasonably straight lines on linear/linear plot
(no log plot) with quite small standard errors for the data points.
This can be done for fibrinopeptides, hemoglobin, and cytochrome c.
Each protein gives a different line. Fibrinopeptides are peptides
clipped from fibrinogen to activate clot formation and essentially all
its amino acids are selectively neutral. The rate of aa substitution
from the line plot of this data gives a value of 9.0 x 10^-9/yr. That
is, for a protein sequence that is quite close to a completely neutral
one, complete (100%) substitution will take, on average, about 10^8 yrs
or 100 million years. *That* is what I based my estimate on, not some
hypothetical calculation from an estimate of mutation rates. BTW, the
rate of aa substitution for alpha globin of hemoglobin is 1.4 x 10^-9
and cytochrome c is 0.3 x 10^-9 and histone H4 is 0.006 x 10^-9. Note
again the fact that the rate of substitution is not a function of size
(other than that small proteins tend to be more constrained *when* more
of the aa's are in contact with the substrate).


> I suggested that the reason for this is that
> such differences are not neutral at all but are functionally
> beneficial depending on the different needs of the different
> creatures.

You can suggest it all you want. But the evidence argues against this
hypothesis. If this hypothesis were true, we would expect to see even
greater variance in rates of substitution within lineages than we do.
And we would have expected to observe, to a much greater extent than we
do, the type of evolutionary change that Denton refuted rather than the
type that Kimura would have predicted.

In case you don't know what I am talking about (a sure bet) it is the
difference between a model that looks like this:

Denton's (false) view of sequence evolution (but not of other types of
evolution):

new amoeba new fish new amphibian new reptile
| | | |
old amoeba -> old fish ----> old amphibian ----> old reptile

That is, the amount of aa difference is primarily due to selection and
once you have an ancestral (old in the above) form of
amoeba/fish/amphibian/reptile that the sequence of the protein
essentially remains close to the primitive sequence first formed in that
group. That is, like some other aspects of phenotype, sequence
evolution produces a structure and then pretty much sticks with it in
all organisms that retain the more primitive features. This is close to
what you would expect if sequence were closely related to structure of
the organism; that is, selection is the main cause of sequence difference.

In Kimura's model, most sequence change is due to selectively neutral or
near neutral change. This does not mean that only *some* amino acids
can change and others can't *at all*. It means that there are
differences in the rate of change at particular sites. The net result
of these individual rates gives the overall rate of change. In this
model, which Denton doesn't even touch, we would expect the following:

new amoeba new fish new amphibian new reptile
\ \ \ /
\ \ \ /
\ \ \ /
\ \ \ /
\ \ \ /
\ \ \ /
\ \ LCA of amphibians
\ \ and reptiles
\ \ /
\ \ /
\ \ /
\ \ /
\ LCA of fish and amphibians
\ /
\ /
\ /
\ /
LCA of amoeba and fish

LCA = Last common ancestor

In this model, the amount of difference expected between any two modern
groups is determined by the number of up and down spaces from one of
them down to the common ancestor and back up to the other group. This,
of course, is essentially the pattern that *was* found. This is the
pattern one would expect if differences were due to most changes being
selectively neutral changes from a common ancestor.


> As such, these differences are maintained over time by
> natural selection and are not allowed to "drift" randomly. You
> basically agree with this in your reply when you assert that these
> changes are "selectable". If they are selectable and not neutral,
> then how are you so sure that clustering supports your notion of
> common decent over my notion of original design?

The expected result if selectively neutral changes were the *only* thing
that happened would be that, as a Poisson process, the variance and the
mean in the rate of substitution should, theoretically, be the same. In
the real world observations, the variance is somewhat to, in some
lineages, significantly higher, but nowhere near as high as they would
be if selection were the only cause of sequence difference.
Additionally, there are a number of reasons for the somewhat higher
variance that do not include selection or fluctuating neutral space
(including non-independence of some changes, population bottlenecks).
But the significant point is that the pattern of differences definitely
looks a lot more like that described by Kimura than the one described by
Denton.

> Different clusters
> are not so much the result of time since a common ancestor as they are
> the result of selectably maintained functions as per similarities and
> differences in functional needs of the individual types of creatures.
> How do you get around this argument? Can you show that the
> similarities and differences are not so much based on similarities and
> differences in functional needs?

If you look at molecular evolution in viruses, especially RNA viruses
that have a much higher rate of mutation because they lack repair
systems, you can observe the same pattern of a relatively constant rate
of sequence change over time. For example, the rate of nucleotide
sequence change in the NS genes influenza virus (from samples saved over
the last 60 years) is 1.73 +/- 0.08 nucleotide substitutions/year. The
+/- shows the degree to which the rate of change is relatively constant
rather than episodic due to changed environments (Buonagurio, D.A., et.
al. Science 232:980-982, 1986). Similar rapid rates of substitution
occur in the polymerase enzyme of the AIDS virus. Again, if there were
strong selection in favor of a *specific* sequence, we would not be
observing this type of change over time.


>
> <snip>
>
>>>The question is then, if bacteria can achieve such relatively rapid
>>>neutral genetic drift, why are they not more wide ranging in their
>>>cytochrome c sequences?
>>
>>Because of evolutionary constraint (aka selection), which is
>>particularly strong when a large fraction of the amino acids are
>>involved in contact with the substrate.
>
>
> Exactly - And this is my whole point.

No it is not. Your point is that the changes are due to selective
effects. The pattern says that the sequences seen are largely due to
neutral change reflecting common ancestry rather than environmental
conditions.

> The differences are not
> necessarily so much the result of divergence over time as they are the
> result of maintenance of beneficial variants over time.

The sequence *similarities* are due to selection for function (all
cytochrome c's have a shared function, which is selected for). The
*differences* are largely due to divergence over time. The *rate* of
divergence is a function of the amount of evolutionary constraint on the
molecule (how much of the molecule is involved in function), but only
the overall *rate* of change is affected and determined by constraint.

>>However, this strong selection
>>bias is a two-edged sword. It makes the evolution *of* a functional
>>sequence from a less useful one much more rapid (and certainly much more
>>rapid than the subsequent neutral fixations), so long as each step in
>>the process (each change) has a selectively significant effect. The
>>stronger the selective effect, the more rapid the fixation.
>
>
> Absolutely true. But, once the most useful sequence has been evolved
> for the needs of a particular kind of creature, that sequence will be
> maintained pretty much unchanged over time.

No. *After* a useful sequence has been evolved, the major change, so
long as the protein has the *same* or very similar function (as is true
for cytochrome c), will be the type of evolutionary drift seen in the
Kimura model of evolutionary change. One will not expect much of the
type of change seen in the Denton model of evolutionary change. This
does not exclude some level of change for selective reasons (say changes
that stabilize the enzyme at different temperatures). But *most* change
subsequent to the ancestral evolved form will be due to selectively
neutral changes.

> This will create a
> clustering effect for that type of creature as compared to other types
> of creatures with different needs and therefore a cluster around a
> different sequence that better suits the different needs of that type
> of creature in its environment.

And you have not presented a shred of evidence in favor of this
"selection is all" idea.

>>>It seems that if these cytochrome c sequence
>>>differences were really neutral differences, that various bacterial
>>>groups, colonies, and species, would cover the entire range of
>>>possible cytochrome c sequences to include that of mammals.
>>
>>There are degrees of constraint, from a position which must be invariant
>>to a position which allows all possible change.
>
>
> For cytochrome c, as we have already discussed at length, the degree
> of constraint is very significant. After all, over 60% of the
> position in this molecule seem to be constrained to within 3 different
> amino acids out of the 20 that are possible. Most people would call
> that sort of constraint very significant.

And that *does* affect the overall *rate* of change. But only the
overall rate. The pattern of difference still varies more as a
consequence of time since divergence than as a consequence of
environmental similarity (similar feeding pattern, environment, activity
pattern). That is, the cyt c's of whales are closer to those of hippos
and deer than those of seals and walruses or manatees; the cytochrome
c's of frogs are roughly equidistant to humans and lizards; the
cytochrome c's of clams are more similar to squid than to tunicates or
barnacles or anemones.

> So, the clustered
> differences that are seen between different creatures are most
> certainly functionally significant. Even you have admitted as much
> here.

Actually the *evidence* says that most of the differences are not
functionally significant.

>>But why would you expect bacterial sequences to cover the entire range
>>of mammals if there is evolutionary constraint?
>
>
> I wouldn't because obviously my main point is that there is
> evolutionary constraint and that this functional constraint is what
> creates the clustering of sequences according to variations in
> functional needs of different organisms. Such constraints could
> maintain such sequences pretty much indefinitely. There is absolutely
> no reason then to assume that similarities and differences are the
> result of divergence from common ancestry over the position that they
> are the result of historically maintained differences in functional
> needs.

Similarity is due to and a consequence of the degree of functional
constraint. Differences (including all the mosaic additions seen in
various bacterial sequences) are *largely* due to pure chance
(selectively neutral drift) as long as function remains the same as the
ancestral sequence. That is what the evidence says.


>
>
>>What I would expect is
>>that all eucaryotic sequences should nest (if all eucaryotic sequences
>>are due to a single horizontal event, of course) and should nest within
>>the bacterial sequences appropriately depending upon the source of
>>mitochondria. What would you expect?
>
>
> I also would expect that they would nest based on similarities and
> differences in functional needs.

What *functional* differences in needs do you see for the various
cytochrome c's of, say, mammals, or tetrapods, or vertebrates or
deuterostomes, or animals that would result in the observed pattern of
difference? Why are whale cyt c's more like that of the closely
related, but living in very different environments, artidactyls rather
than like that of the more distantly related, but living in similar
environments and eating a similar diet, seals?

>>>Why are
>>>they then so uniformly separated from all other "higher" species
>>>unless the cytochrome sequences are functionally based and therefore
>>>statically different due to the various needs of creatures that
>>>inhabit different environments?
>>
>>I am not clear what you mean here. Are you asking "why do mammalian cyt
>>c's form a nested series branching from other multi-cellular organisms"
>>or are you talking about where bacteria fit in?
>
>
> I'm talking about both since both scenarios would have been completely
> obliterated in the time that is supposed to have occurred since their
> common divergence if the changes were truly neutral. If they weren't
> neutral, then my argument stand that these differences could have
> always been there and do not necessarily necessitate an interpretation
> of common ancestry.

Selection largely acts as a constraining force affecting the *rate* of
change. The first fully functional cyt c to evolve (the long gone
ancestral cyt c) was selected *for* on the basis of utility in directing
electron transfer events. After the point of optimal utility was
reached, selection acted as a constraining force to *retain* functional
utility. The question is whether the pattern of change observed in
modern organisms is consistent with modern organisms having cyt c's
derived from that common ancestor's sequence (with selection acting
largely to retain *function* and retain sequence only to the extent that
sequence is crucial to function, with neutral drift allowing change ala
the Kimura model at a *rate* which will differ from one functional
sequence to another). The evidence largely supports the last model.

>>>For example, bacteria are thought to share a common ancestor with
>>>creatures as diverse as snails, sponges, and fish dating all the way
>>>back to the Cambrian period some 600 million years ago. All of these
>>>creatures are thought to have been around quite a long time - ever
>>>since the "Cambrian Explosion." In fact, they have all been around
>>>long enough and are diverse enough to exhibit quite a range in
>>>cytochrome c variation. Why then are their cytochrome c sequences so
>>>clustered? Why don't bacteria, snails, fish, and sponges cover the
>>>range of cytochrome c sequence variation if these variation
>>>possibilities are in fact neutral?
>>
>>Because cytochrome c sequences are not all selectively neutral.
>
>
> Again, you hit the nail on the head. That is my whole point. These
> differences are not selectively neutral at all.

All this affects is the *rate* of change at those sites. That can vary
widely from protein to protein as a function of size and function, but
the net effect over the entire protein seems to be relatively constant
over time.

>>There
>>is a wide range of constraint, from very little to invariant.
>
>
> The range may be wide, but the limits are significant none-the-less,
> as previously noted.
>
>
>>The net
>>effect, in cytochrome c, is a relatively large degree of constraint,
>>with most of the variance limited to particular regions of the protein
>>or peptide sequence.
>
>
> Exactly. And these regions are so extensive as to limit over 60% of
> the amino acid positions to within 3 amino acids.

Again, it is a matter of *rate* of change over time. That *rate* is
relatively consistent.


>
>
>> Other proteins or peptides, especially larger
>>ones, show a much lower degree of constraint.
>
>
> True.
>
>
>>The high degree of
>>constraint is what makes cyt c so useful for deep phylogeny.
>
>
> This is an unfounded assumption based on the notion that the theory of
> evolution is already true. By itself, this idea cannot stand since
> such nested hierarchies could also be just as easily explained by the
> maintenance over time of original differences that are functionally
> advantageous for different types of creatures in various environments.

Nope. Maintenance over time of original differences would produce a
pattern similar either to Denton's model (if one assumed an evolutionary
relationship) or to a model where one would not be able to see *any*
pattern or a model where sequence was a function of environment. One
does not see the latter patterns. One sees the pattern consistent with
non-selective change as a function of time since divergence.


>
>
>>If cyt c
>>were, in fact, as selectively neutral as fibrinogen peptide is, it would
>>be useless for deep phylogeny.
>
>
> Now why is this? You said, and I quote, that it would take, "around
> 100 million years to replace a single selectively neutral amino acid."
> Did you mistype here?

I may have misstated. That should be "100 million years to replace, on
average, *each* selectively completely neutral amino acid by neutral
drift." That is, 100% replacement.


>
>
>> It would be useless after about 100
>>million years.
>
>
> Not if you were correct in your above quoted statement. You must
> either be wrong here or there. Which is it? The only reason why it
> would be useless after far less than 100 million years (say 5 or 10
> million at the most) is that selectively neutral positions drift at a
> very rapid rate consistent with my own calculations that you said were
> wrong.

Again. I am using the *observed* rate and amount of change in the
fibrinogen peptide as an estimate of change at selectively neutral amino
acid sites. Your calculations are irrelevant in the face of evidence.

>>If it were as constrained as hemoglobin, it would be
>>useless for analyzing the relationship of bacteria, but still useful for
>>analyzing relationships among vertebrates.
>
>
> And yet hemaglobins are still clustered in widely divergent creatures
> despite the lower level of constraint.

Alpha globin's rate of neutral change lies between that of cyt c and
fibrinogen peptides. But the pattern gives, largely, the *same* nested
hierarchy as cyt c does, and to the extent it is possible to tell, the
same nested hierarchy as fibrinogen peptides do. That is, whales still
nest with artiodactyls and not seals. Humans and chimps who separated
some 5-6 milllion years ago still show more similarity in sequence than
very similar frogs that were divided by the Atlantic Ocean's opening up
40 million years ago. Humans and chimps (ground-dwelling omnivores)
show more sequence similarity with leaf-eater monkeys (arboreal
herbivores) than with racoons (ground-dwelling omnivores). No evidence
of your elusive nested difference as a function of functional
constraint. But you tell me where the evidence of a nested hierarchy
that is a function of the environment, diet, etc. that an organism lives
in rather than common descent exists.

>>>I propose that the clustered differences that are seen in genes and
>>>protein sequences, such cytochrome c, are the result of differences in
>>>actual function that actually benefit the various organisms according
>>>to their individual needs.
>>
>>Then you would be wrong. The nesting pattern of similarity is related
>>to time since divergence, not morphological or physiological or
>>environmental similarity.
>
>
> How do you know? It is possible, is it not, for natural selection to
> maintain functional differences that are beneficial because of their
> differences in different creatures?

Of course. But then you would see a different nested hierarchy. One
related to environmental conditions the organism lives in, as selection
produces similar sequences to match the environment. You don't.

>>That is, humans and chimps will show more
>>identity, despite being morphologically dissimilar and living in
>>different environments -- such as Eskimos do.
>
>
> Humans and chimps are not all that dissimilar in morphology or
> environmental habitats. They are in fact quite similar in functional
> needs.
>
>
>> Frogs, which have been
>>separated in S. America and Africa, but are morphologically similar,
>>inhabit very similar environments, and are nearly identical
>>physiologically, but have been separated for 40 million years (as
>>opposed to 5 million for chimps and humans) will show more sequence
>>variation.
>
>
> It depends upon what sequences are involved. Are these cytochrome c
> sequences or other functional genetic sequences, or are they
> comparisons between neutral non-functional genetic sequences.
>
>
>>Your hypothesis has been proposed, checked out, and
>>discarded. When creationism is not bogus non-science it is refuted science.
>
>
> Oh really?! Where is your reference for this notion that similar
> creatures in different environments have significant differences that
> are not functionally based as part of a functional gene? Haven't you
> heard that different modern races of humans who live in different
> environments have functionally different genes with the same type of
> function and that these differences are maintained because they are
> beneficially different in different environments?

Look, I am not saying that selection for sequence difference cannot
happen. I am saying that such events are rare and rapid and also
rapidly reversible. And these selective differences usually do not
involve changing 'hundreds' of amino acids (usually only one or two are
involved). They are unable to explain *most* observed difference in
sequence. The difference below, a change that affects the efficiency of
utilizing fuel, is a change that can certainly have selective utility.
I would also predict that very different organisms (wrt to history)
living in similar environments might have the *same* selective changes
in the *same* one or two amino acids to accomplish the same slight
functional difference. But they would have many more differences that
are a consequence of history alone. The thing is that changing one aa
or two aa will not produce the amount of sequence difference that is
seen in sequences from very different organisms that share the same
environment. It is the massive *amount* of change that is selectively
neutral that determines the observed pattern and not the smaller amount
of change that may be selected for that determines the overall pattern.
Selection is largely conservative, but is much more powerful than
random drift. So when there is selection for change, as in the
evolution of mitochondria that are less efficient, thus liberating more
heat, it is relatively rapid. That is part of the reason why one does
not see the pure Poisson randomness that would say that *all*
differences (rather than most) are selectively neutral. But the fact
remains that the overall pattern is NOT one of selection for sequences
where most changes have selective effect, it is a pattern that largely
says 'historical neutral changes'.

The evidence that the pattern is consistent with a historical one rather
than whatever selective model you are thinking.

>
>> Obviously, one certainly needs to worry about the
>>re-ratting problem when one has a highly constrained protein. That is,
>>since a particular aa site can only vary between 3 amino acids, for
>>example, it is entirely possible for such a site to have gone from an
>>ancestral amino acid to a lineage which has one of the other two
>>possibilities, and be in a branch off of that lineage which has reverted
>>back to the original aa. There are statistical mechanisms for
>>recognizing such re-rats. As long as a significant number of amino
>>acids are examinied and constraint is not absolute in all of them, it is
>>usually possible to identify such cases.
>
>
> Oh really? And how do you tell the difference between functional
> maintenance and neutral changes that would have been "re-rated"
> thousands of times by now if they were in fact neutral?

Degree of retention.

>>>Because of this, sequence
>>>differences may not be so much the result of differences due to random
>>>mutation over time as they are due to differences in the functional
>>>needs of different creatures. I think that the same can be said of
>>>most if not all phylogenies that are based on genotypic differences
>>>between creatures.
>>
>>Again, this can be and has been tested. The nested pattern of
>>relationship generated by statistical methods of minimum parsimony, etc.
>>do not form a pattern that is a function of functional needs. It forms
>>a historical pattern related to time since divergence.
>
>
> And how is this determined? How do you know that these trees are not
> based on functional needs? I've never seen this explained in a way in
> which I could understand it.

Because there are a large number of examples of relatively recently
diverged organisms that have dramatically changed their environment
(such as whales) that share that new environment with other organisms
that diverged from a different ancestry (seals) but have a very
different environment from their closer relations (artiodactyls). Or
plants that have evolved large tree-like forms rapidly from smaller
forms. In essentially every test of this type, the pattern remains
consistent with common ancestry rather than shared environment. That
does not mean that there can't be one or two changes in aa that are a
consequence of selection for particular environmental need (such as a
need for a less efficient metabolism to generate more heat in both
whales and seals). But these *selected for* changes seem to be in the
minority and consequently the overall pattern of sequence change is one
of common descent and history as evidenced by neutral changes rather
than selective ones determined by the environment the organisms live in.

[snip]

Sean Pitman

unread,
Jan 14, 2004, 6:02:34 AM1/14/04
to
RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bt8i6p$r9h$1...@news01.cit.cornell.edu>...

> > Sean Pitman wrote:
> >
> > With all due respect, what is your area of professional training? I
> > mean, after reading your post I dare say that you are not only weak in
> > biology, but statistics as well. Certainly your numbers and
> > calculations are correct, but the logic behind your assumptions is
> > extraordinarily fanciful. You sure wouldn't get away with such
> > assumptions in any sort of peer reviewed medical journal or other
> > statistically based science journal - that's for sure. Of course, you
> > may have good success as a novelist . . .
>
> Tsk, tsk... I thank you for the career advice. I'll keep it in mind,
> should my current stint in computer science fall through. I wouldn't go
> so far as to say that Monte-Carlo methods are my specialty, but I will
> say that my own research and the research of half my colleagues would be
> non-existent if they worked the way you think they do.

Hmmmm, so what has your research shown? I've seen nothing from the
computer science front that shows how anything new, such as a new
software program, beyond the lowest levels of functional complexity
can be produced by computers without the input of an intelligent mind.
Your outlandish claims for the result of research done so far, such
as the Lenski experiments, are just over the top. They don't
demonstrate anything even close to what you claim they demonstrate
(See Below).

> >>I'll try to address some of the mistakes you've made below, though I
> >>doubt that I can do much to dispel your misconceptions. Much of my
> >>reply will not even concern evolution in a real sense, since I wish to
> >>highlight and address the mathematical errors that you are making.
> >
> > What you ended up doing is highlighting your misunderstanding of
> > probability as it applies to this situation as well as your amazing
> > faith in an extraordinary stacking of the deck which allows evolution
> > to work as you envision it working. Certainly, if evolution is true
> > then you must be correct in your views. However, if you are correct
> > in your views as stated then it would not be evolution via mindless
> > processes alone, but evolution via a brilliant intelligently designed
> > stacking of the deck.
>
> Exactly what views did I state, Sean? Other than that your calculations
> are, to put it plainly, irrelevant. Not even wrong - just irrelevant.
>
> Yes, the example I give below incredibly stacks the deck in my favor.
> It ought to. It is what is called a "counter-example". It falsifies
> the hypothesis that your "model" of evolution is correct. Now aren't
> you glad you proposed something falsifiable?

Come again? How does your stacking the deck via the use of
intelligent design, since there is no other logical way to stack the
deck so that your scenario will actually work, disprove my position?
My hypothesis is dependent on the far more likely scenario that the
deck in not stacked as you suggest, but is in fact much more random
than you seem to think it is. Certainly the ONLY way evolution could
work is if the deck was stacked, but then this would be easily
detected as evidence of intelligent design, not the normal
understanding of evolution as a mindless non-directed process.



> > This distribution of states has very little if anything to do with how
> > much time it takes to find one of them on average. The starting point
> > certainly is important to initial success, but it also has very little
> > if anything to do with the average time needed to find more and more
> > beneficial functions within that same level of complexity.
>
> Except in every real example of a working Monte-Carlo procedure, where
> the distribution and starting point have *everything* to do whether such
> a procedure is successful or not.

You mean that the stacking of the deck has everything to do with
whether or not an "evolutionary" scenario will succeed. Certainly
this would be true, but such a stacking of the deck has no resemblance
to reality. You must ask yourself about the likelihood that one will
find such a stacked deck in real life outside of intelligent design .
. .

> > For
> > example, if all the beneficial states were clustered together in one
> > or two areas, the average starting point, if anything, would be
> > farther way than if these states were distributed more evenly
> > throughout the sequence space. So, this leaves the only really
> > relevant factor - the types of steps and the number of steps per unit
> > of time. That is the only really important factor in searching out
> > the state space - on average.
>
> *Sigh*. The problem is that the model *you* are proposing (one I think
> is silly) is of a random on walk on a specific frozen sequence space
> with beneficial sequences as points in that space. It does not deal
> with an "average" distribution, and an "average" starting point, but
> with one very specific distribution of beneficial sequences and one very
> specific starting point.

Consider the scenario where there are 10 ice cream cones on the
continental USA. The goal is for a blind man to find as many as he
can in a million years. It seems that what you are suggesting is that
the blind man should expect that the ice cream cones will all be
clustered together and that this cluster will be with arms reach of
where he happens to start his search. This is simply a ludicrous
notion outside of intelligent design. My hypothesis, on the other
hand, suggests that these 10 ice cream cones will have a more random
distribution with hundreds of miles separating each one, on average.
An average starting point of the blind man may, by a marvelous stroke
of luck, place him right beside one of the 10 cones. However, after
finding this first cone, how long, on average, will it take him to
find any of the other 9 cones? That is the question here. The very
low density of ice cream cones translates into a marked increase in
the average time required to find them. Now, if there were billions
upon billions of ice cream cones all stuffed into this same area, then
one could reasonably expect that they would be separated by a much
closer average distance - say just a couple of feet. With such a high
density, the average time needed for the blind man to find another ice
cream cone would be just a few seconds.

So, whose position is more likely? Your notion that the density of
beneficial sequences in sequence space doesn't matter or my notion
that density does matter? Is your hypothetical situation where a low
density of beneficial states is clustered around a given starting
point really valid outside of intelligent design? If so, name a
non-designed situation where such an unlikely phenomenon has ever been
observed to occur . . .

> You cannot simply assume an "average"
> distribution in the absence of background information: you have to find
> out precisely the kind of distribution you are dealing with. And even
> if you do find that the distribution is "stacked", it does not imply
> that an intelligence was involved.

Oh really? You think that stacking the deck as you have done can
happen mindlessly in less than zillions of years of average time?
Come on now! What planet are you from?

> The stacking could occur due to the
> constraints imposed by the very definition of the problem: in the case
> of evolutions, by the physical constraints governing the interactions
> between the molecules involved in biological systems.

Oh, so the physical laws of atoms and molecules force them to
self-assemble themselves in functionally complex systems? Now you are
really reaching. Tell me why the physical constraints of these
molecular machines force all beneficial possibilities to be so close
together? This is simply the most ludicrous notion that I have heard
in a very long time. You would really do well in Vegas with that one!
Try telling them, when they come to arrest you for cheating, that the
deck was stacked because of the physical constraints of the playing
cards.

> In fact, why
> would you expect that the regular and highly predictable physical laws
> governing biochemical reactions would produce a random, "average"
> distribution of "beneficial sequences"?

Because, I don't know of any requirement for them to be clustered
outside of deliberate design - do you? I can see nothing special
about the building blocks that make up living things that would cause
the potentially beneficial systems found in living things to have to
be clustered (just like there is nothing inherent in playing cards
that would cause them to stack themselves in any particular order).
However, if you know of a reason why the physical nature of the
building blocks of life would force them to cluster together despite
having a low density in sequence space, please, do share it with me.
Certainly none of your computer examples have been able to demonstrate
such a necessity. Why then would you expect such a forced clustering
in the potentially beneficial states of living things?

> >>For an extreme
> >>example, consider a space of strings consisting of length 1000, where
> >>each position can be occupied by one of 10 possible characters.
>
> Note, I wrote, "extereme example". My point was *not* invent a
> distribution which makes it likely for evolutiuon to occur (this example
> has about as much to do with evolution as ballet does with quantum
> mechanics), but to show how inadequate your methods are.

Actually, this situation has a lot to do with evolution and is the
real reason why evolution is such a ludicrous idea. What your
illustration shows is that only if the deck is stacked in a most
unlikely way will evolution have the remotest possibility of working.
That is what I am trying to show and you demonstrated this very
nicely. Unwittingly it is you who effectively show just how
inadequate evolutionary methods are at making much of anything outside
of an intelligently designed stacking of the deck.



> >>Suppose there are only two beneficial strings: ABC........, and
> >>BBC........ (where the dots correspond to the same characters). The
> >>allowed transitions between states are point mutations, that are
> >>equally probable for each position and each character from the
> >>alphabet. Suppose, furthermore, that we start at the beneficial state
> >>ABC. Then, the probability of a transition from ABC... to BBC... in a
> >>single mutation 1/(10*1000) = 1/10000 (assuming self-loops - i.e.
> >>mutations that do not alter the string, are allowed).
> >
> >
> > You are good so far. But, you must ask yourself this question: What
> > are the odds that out of a sequence space of 1e1000 the only two
> > beneficial sequences with uniquely different functions will have a gap
> > between them of only 1 in 10,000?
>
> Mind-numbingly low. 1000*.9*.1^999, to be precise. But that is not the
> point.

Actually, this is precisely the point. What you are basically saying
is that if there were only one ice cream cone in the entire universe
that it could be easily found if the starting point of the blind man's
search just so happened to be an arms reach away from the cone. That
is what you are saying is it not?



> > Don't you see the problem with this little scenario of yours?
> > Certainly this is a common mistake made by evolutionists, but it is
> > none-the less a fallacy of logic. What you have done is assume that
> > the density of beneficial states is unimportant to the problem of
> > evolution since it is possible to have the beneficial states clustered
> > around your starting point. But such a close proximity of beneficial
> > states is highly unlikely. On average, the beneficial states will be
> > more widely distributed throughout the sequence space.
>
> On average, yes.

On average yes?! How can you say this and yet disagree with my
conclusions?

> But didn't you just say above that the distribution
> of the sequences is irrelevant? That all that matters is "ratio" of
> beneficial sequences?

It is only by determining the ration of beneficial sequences that you
can obtain a reasonable idea about the likely distribution of these
sequences around any particular starting point. You hold a huge
fallacy of logic that by some magical means the distribution could be
just right even though the density is truly miniscule (like the
finding one atom in zillions of universes the size of ours).

> (Incidentally, "ratio" and "density" are not
> identical. The distribution I showed you has a relatively high density
> of beneficial sequences, despite a low ratio.)

You are talking local "density", which, in your scenario, also has a
locally high "ratio". I, on the other hand, was talking about the
total ratio and density of the whole potential space taken as a whole.
Really, you are very much mistaken to suggest that the ratio and
density of a state in question per the same unit of state space are
not equivalent.

> > For example, say that there are 10 beneficial sequences in this
> > sequence space of 1e1000. Now say one of these 10 beneficial
> > sequences just happens to be one change away from your starting point
> > and so the gap is only a random walk of 10,000 steps as you calculated
> > above. However, on average, how long will it take to find any one of
> > the other 9 beneficial states? That is the real question. You rest
> > your faith in evolution on this inane notion that all of these states
> > will be clustered around your starting point. If they were, that
> > certainly would be a fabulous stroke of luck - like it was *designed*
> > that way. But, in real life, outside of intelligent design, such
> > strokes of luck are so remote as to be impossible for all practical
> > purposes. On average we would expect that the other nine sequences
> > would be separated from each other and our starting point by around
> > 1e999 random walk steps/mutations (i.e., on average it is reasonable
> > to expect there to be around 999 differences between each of the 10
> > beneficial sequences). So, even if a starting sequence did happen to
> > be so extraordinarily lucky to be just one positional change away from
> > one of the "winning" sequences, the odds are that this luck will not
> > hold up as well in the evolution of any of the other 9 "winning"
> > sequences this side of a practical eternity of time.
>
> Unless, of course, it follows from the properties of the problem that
> the other 9 benefecial sequences must be close to the starting sequence.

And I am sure you have some way to explain why these 9 other
beneficial sequences would have to be close together outside of
deliberate design? What "properties" of the problem would force such
a low density of novel beneficial states to be so clustered? I see
absolutely no reason to suggest such a necessity. Certainly such a
necessity much be true if evolution is true, but if no reasonable
naturalistic explanation can be given, why should I simply assume such
a necessity? Upon what basis do you make this claim?

> > Real time experiments support this position rather nicely. For
> > example, a recent and very interesting paper was published by Lenski
> > et. al., entitled, "The Evolutionary Origin of Complex Features" in
> > the 2003 May issue of Nature. In this particular experiment the
> > researchers studied 50 different populations, or genomes, of 3,600
> > individuals. Each individual began with 50 lines of code and no
> > ability to perform "logic operations". Those that evolved the ability
> > to perform logic operations were rewarded, and the rewards were larger
> > for operations that were "more complex". After only15,873 generations,
> > 23 of the genomes yielded descendants capable of carrying out the most
> > complex logic operation: taking two inputs and determining if they are
> > equivalent (the "EQU" function).
>
> I've already covered how you've completely misinterpreted Lenski's
> research in the other post. But let's run with this for a bit:

Lets . . . Oh, and if you would give a link to where you "covered" my
"misinterpretation", that would be appreciated.

> > In principle, 16 mutations (recombinations) coupled with the three
> > instructions that were present in the original digital ancestor could
> > have combined to produce an organism that was able to perform the
> > complex equivalence operation. According to the researcher themselves,
> > "Given the ancestral genome of length 50 and 26 possible instructions
> > at each site, there are ~5.6 x 10e70 genotypes [sequence space]; and
> > even this number underestimates the genotypic space because length
> > evolves."
> >
> > Of course this sequence space was overcome in smaller steps. The
> > researchers arbitrarily defined 6 other sequences as beneficial (NAND,
> > AND, OR, NOR, XOR, and NOT functions).
>
> As a minor quibble, I believe they actually started with NAND (you need
> it for all the other functions). But I could be wrong - I've read that
> paper months ago.

You are correct. The fact is though that the NAND starting point was
defined as beneficial and it was not made up of random sequences of
computer code. It was all set up very specifically so that certain
recombinations of code (point mutations were not primarily used,
though they did happen on occasion during recombination events), would
yield certain types of other pre-determined coded functions.

> > those in the reward-all environment (2.15 x 1e7 versus 1.22 x 1e7;


> > P<0.0001, Mann-Witney test), because they tended to have smaller
> > genomes, faster generations, and thus turn over more quickly. However,
> > all populations explored only a tiny fraction of the total genotypic
> > space. Given the ancestral genome of length 50 and 26 possible

> > instructions at each site, there are ~5.6 x 1e70 genotypes; and even


> > this number underestimates the genotypic space because length
> > evolves."
>
> And after years of painstaking research, Sean finally invents the wheel.
> Yes, evolution does not pop complex systems out of thin air, but
> constructs through integration and co-optation of simpler functional
> components. Move along, folks, nothing to see here!

What this shows is that if the "simpler" components aren't defined as
"beneficial" then a system of somewhat higher complexity will not
evolve at all - period - even given zillions of years of time. Truly,
this means that there really isn't anything to see here. Nothing
evolves without the deck being stacked by intelligent design. That is
all this Lenski experiment showed.

> > Isn't that just fascinating? When the intermediate stepping stone
> > functions were removed, the neutral gap that was created successfully
> > blocked the evolution of the EQU function, which happened *not* to be
> > right next door to their starting point. Of course, this is only to
> > be expected based on statistical averages that go strongly against the
> > notion that very many possible starting points would just happen to be
> > very close to an EQU functional sequence in such a vast sequence
> > space.
>
> Here's a question for you. There were only 5 beneficial functions in
> that big old sequence space of yours.

Actually, including the starting and ending points, there were 7
defined beneficial sequences in this sequence space (NAND, AND, OR,
NOR, XOR, NOT, and EQU functions).

> They are all very standard
> Boolean functions: in no way were they specifically designed by Lenski
> et. al. to ease the way to into evolving the EQ functions.

Actually, they very much were designed by Lenski et. al. to ease the
way along the path to the EQU sequence. The original code was set up
with very specific lines of code that could, when certain
recombinations occurred, give rise to each of these logic functions.
The lines of code were not random lines of code and they were not all
needed to be as they were for the original NAND function to operate.
In fact the researchers knew the approximate rate of evolution that
would be expected ahead of time based on their programming of the
coded sequences, the rate of recombination of these sequences, the
size of the sequence space and the distance between each step along
the pathway. It really was a very nice setup for success. Read the
paper again and you will see that this is true.

> How come
> they were all sufficiently close in sequence space to one another, when
> according to you such a thing is so highly improbable?

Because they were designed to be close together deliberately. The
deck was stacked on purpose. I mean really, you can't be suggesting
that these 7 beneficial states just happened to be clustered together
in a state space of 1e70 by the mindless restriction of the program do
you? The program was set up with the restrictions stacked in a
particular way so that only these 7 states could evolve and that each
subsequence state was just a couple of steps away from the current
state. No other function was set up to evolve, so no other novel
function evolved. These lines of code did not get together and make a
calculator program or a photo-editing program, or even a simple
program to open the CD player. That should tell you something . . .
This Lenski experiment was *designed* to succeed like it did. Without
such input of intelligent deck stacking, it never would have worked
like it did.

> > Now, isn't this consistent with my predictions? This experiment was
> > successful because the intelligent designers were capable to defining
> > what sequences were "beneficial" for their evolving "organisms." If
> > enough sequences are defined as beneficial and they are placed in just
> > the right way, with the right number of spaces between them, then
> > certainly such a high ratio will result in rapid evolution - as we saw
> > here. However, when neutral non-defined gaps are present, they are a
> > real problem for evolution. In this case, a gap of just 16 neutral
> > mutations effectively blocked the evolution of the EQU function.
>
> You are not even close. Lenski et. al. didn't define which *sequences*
> were "beneficial".

Yes, they did exactly that. Read the paper again. They arbitrarily
wrote the code in a meaningful way for the starting lines as well as
arbitrarily defined which recombinations would be "beneficial". They
say it in exactly that way. They absolutely say that they defined
what was and what was not "beneficial".

> They didn't even design functions to serve
> specifically as stepping stones in the evolutionary pathways of EQ.

Yes they did in that they wrote the original code so that it would be
possible to form such pre-defined "beneficial" codes in a series of
recombinations of lines of code.

> What they have done is to name some functions of intermediate complexity
> that might be beneficial to the organism.

You obviously either haven't read the original paper or you don't
understand what it said. The researchers openly admit to arbitrarily
defining the "intermediate" states as beneficial. This fact is only
proven because they went on to remove the "beneficial" definition from
these intermediate states. Without this arbitrary assignment of
beneficial to the intermediate states, the EQU state did not evolve.
Go back an read the paper again. It was the researchers who defined
the states. The states themselves obviously didn't have inherent
benefits in the "world" that they were evolving in outside of the
researcher's definitions for them.

> They certainly did not tell
> their program how to reach these functions, or what the systems
> performing these functions might look like, but simply indicated that
> there are functions at varying levels of complexity that might be useful
> to an organism in its environment.

Wrong again. They did in fact tell their program exactly which
states, specifically, to reward and how to reward them if present.
They told the program exactly what they would look like ahead of time
so that they would be recognized and treated as beneficial when they
arrived on the scene.

You really don't seem like you have a clue how this experiment was
done. I really don't understand how you can make such statements as
this if you had actually read the paper.

> Thus, they have demonstrated exactly
> what they set out to: that in evolution, complex functional features are
> acquired through co-optation and modification of simpler ones.

They did nothing of the sort. All they did was show that stacking the
deck by intelligent design really does work. The problem is that
evolution is supposed to work to create incredible diversity and
informational complexity without any intelligent intervention having
ever been required. So, you evolutionists are back to ground zero.
There simply is no evolution, outside of intelligent design, beyond


the lowest levels of functional/informational complexity.

<snip>


> >>(Again, it is a
> >>gross, meaningless over-simplification to model evolution as a random
> >>walk over a frozen N-dimensional sequence space, but my point is that
> >>your calculations are wrong even for that relatively simple model.)
> >
> > Come now Robin - who is trying to stack the deck artificially in their
> > own favor here? My calculations are not based on the assumption of a
> > stacked deck like your calculations are, but upon a more likely
> > distribution of beneficial sequences in sequence space. The fact of
> > the matter is that sequence space does indeed contain vastly more
> > absolutely non-beneficial sequences than it does those that are even
> > remotely beneficial.
>
> Yes, but your caclulations are based on the equally unfounded assumption
> that the deck is not stacked in any way, shape, or form. (That is, if
> the sequences were really distributed evenly in your frozen sequence
> space, then your probability calculation would still be off, but not by
> too much.)

Not by too much? Hmmmmm . . . So, you are saying that if the
sequence space where set up even close to the way in which I am
suggesting then my calculations would be pretty much correct? So,
unless the sequence space looks like you envision it looking, all nice
and neatly clustered around your pre-arranged starting point, then I
am basically right? So, either the deck is stacked pretty much like
you suggest or the deck is more randomly distributed like I suggest.
If it is stacked, then you are correct and evolution is saved. If the
deck is more randomly distributed like I suggest, then evolution is
false and should be discarded as untenable - correct?

Now where did I miss it? You said at the beginning that my
calculations were completely off base given my own position and that
you were going to correct my math. You said that I needed special
training in statistics. Now, how can my calculations be pretty much
on target given my hypothesis and yet I not know anything about
statistics?

> What makes you think that the laws of physics do not stack
> the deck sufficiently to make evolution possible?

More importantly, what makes you think that they do? I've never seen
a mindless process stack the deck like this, have you? Where are your
examples of mindless processes stacking the deck in such as way as you
suggest outside of aid of intelligent design?

> You may feel that
> they can't: but in the meantime, you should be striving to find out what
> the actual distribution is, rather than assuming it is unstacked. (Not
> that this would make your model relevant, but it'll be a small step in
> the right direction.)

Actually, an unstacked deck would make my model very relevant indeed.
You admit as much yourself when you say that my calculations are
pretty much correct give that the hypothesis of an unstacked deck is
true. Now, the ball is in your court. It is so extremely
counterintuitive to me that the deck would be unstacked that such an
assertion demands equivalent evidence. Where do you see such deck
stacking outside of intelligent design? That is the real question
here.

> > In fact, there is an entire theory called the
> > "Neutral Theory of Evolution". Of all mutations that occur in every
> > generation in say, humans (around 200 to 300 per generation), the
> > large majority of them are completely "neutral" and those few that are
> > functional are almost always detrimental. This ratio of beneficial to
> > non-beneficial is truly small and gets exponentially smaller with each
> > step up the ladder of specified functional complexity. Truly,
> > evolution gets into very deep weeds very quickly beyond the lowest
> > levels of functional/informational complexity.
>
> The fact that the vast majority of mutations are neutral does not imply
> that there exists any point where there is no opportunity for a
> beneficial mutation. And where such an opportunity presents itself,
> evolution will eventually find it, given large enough populations and
> sufficient times.

Yes, if by "sufficient time" you mean zillions of years - even for
extremely large populations.

> >>>It will take
> >>>just over 1,000 seconds - a bit less than 20 minutes on average. But,
> >>>what happens if at higher levels of functional complexity the density
> >>>of beneficial functions decreases exponentially with each step up the
> >>>ladder? The rate of search stays the same, but the junk sequences
> >>>increase exponentially and so the time required to find the rarer and
> >>>rarer beneficial states also increases exponentially.
> >>
> >>The above is only true if you use the following search algorithm:
> >>
> >> 1. Generate a completely random N-character sequence
> >> 2. If the sequence is beneficial, say "OK";
> >> Otherwise, go to step 1.
> >
> > Actually the above is also true if you start with a likely starting
> > point. A likely starting point will be an average distance away from
> > the next closest beneficial sequence. A random mutation to a sequence
> > that does not find the new beneficial sequence will not be selectable
> > as advantageous and a random walk will begin.
>
> Actually, your last paragraph will be approximately true only if all
> your "beneficial" points are uniformly spread out through your sequence
> space.

In other words, if they aren't stacked in some extraordinarily
fortuitous fashion?

> Even then, your probability calculation will be off by some

> orders of magnitude, since you will actually need to apply combinatorial
> forumlas to compute these probabilities correctly. But, I suppose,
> it'll be close enough.

My calculations will not be off too far. And, even if they are off by
a few orders of magnitude, it doesn't matter compared to the numbers
involved. As you say, the rough estimates involved here are clearly,
"close enough" to get a very good idea of the problem. My math is not
"way off" as you originally indicated. If anything you have a
conceptual problem with my hypothesis, not my statistics/math. It
basically boils down to this: Either the deck was stacked by a
mindless or a mindful process. You have yet to provide any convincing
evidence that a mindless process can stack a deck, like it would have
to have been stacked for life forms to be as diverse and complex they
are, outside of a lot of help from intelligent design.

<snip>


> >> I could also
> >>very easily construct an example where the ratio is nearly one, yet a
> >>random walk starting at a given beneficial sequence would stall with a
> >>very high probability.
> >
> > Oh really? You can construct a scenario where all sequences are
> > beneficial and yet evolution cannot evolve a new one? Come on now . .
> > . now you're just being silly. But I certainly would like to see you
> > try and set up such a scenario. I think it would be most
> > entertaining.
>
> I didn't say all sequences are beneficial, Sean. That *would* be silly.
> I did say that the ratio *approaches* one, but is not quite that.
> But, here you are:
>
> Same "sequence space" as before, but now a sequence is "beneficial" if
> it is AAAAAAAAAA......AAA (all A's), or it differs from AAAAA...AAA by
> at least 2 amino acids. All other sequences are *harmful* - if the
> random walk ever stumbles onto one, it will die off, and will need to
> return to its starting point. (This means there are exactly 1000*9 +
> (1000*999/2)*81 or about 4.02e6 harmful sequences, and 1e1000-4.02e6 or
> about 1e1000 beneficial sequences: that is, virtually every sequence is
> beneficial.) Again, the allowed transitions are point mutations, and
> the starting point is none other AAAAAAA...AAA. Now, will this random
> walk ever find another beneficial sequence?

Your math here seems to be just a bit off. For example, if out of
1e1000 the number of beneficial sequences was 1e999, the ratio of
beneficial sequences would be 1 in 10. At this ratio, the average
distance to a new beneficial function would not be "two amino acid
changes away", but less than one amino acid change away. The ratio
created by "at least 2 amino acid changes" is less than 1 in 400, not
less than 1 in 10 like you suggest here.

Also, even if all sequences less than 2 amino acid changes were
detrimental (which is very unlikely), an average bacterial colony of
100 billion or so individuals would cross this 2 amino acid gap in
short order since a colony this size would experience a double
mutation in a sequence this size in several members of its population
during the course of just one generation.



> > And if you wish to model evolution as a walk between tight clusters of
> > beneficial sequences in an otherwise extraordinarily low density
> > sequence space, then I have some oceanfront property in Arizona to
> > sell you at a great price.
>
> If I did wish to model evolution this way, then I would gladly buy this
> property off your hands. And then sell it back to you at twice the
> price, because it would still be better than the model you propose.

LOL - Ok, you just keep thinking that way. But, until you have some
evidence to support your wishful thinking mindless stacking of the
deck hypothesis, what is there to make your position attractive or
even remotely logical?

> Cheers,
> RobinGoodfellow.

Sean
www.naturalselection.0catch.com

0 new messages