What Makes Something "Beneficial"?

0 views
Skip to first unread message

Sean Pitman

unread,
Dec 22, 2003, 5:05:48 PM12/22/03
to
From: Chris Merli (clm...@insightbb.com)
Subject: Re: Bill Roger's question for Sean Pitman
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=CQsvb.203225%24275.755591%40attbi_s53&rnum=5
Date: 2003-11-21 10:31:29 PST
>
> If we simply gave a definition to each three letter
> word then they would all have meaning. Perhaps
> more directly are the three letter word that have
> meaning only that due to our defining them as
> having meaning. In a similar way you have
> dismissed most of the protein sequences as useless
> but given the proper context couldn't every protein
> have some function? I guess this could be easily
> disproved if you simply showed one that has no
> function in any context.

The context of protein meaning/function is created by the individual
organism as it interacts with its particular environment. It really
doesn't matter if a particular protein might have some beneficial
meaning/function in a different organism/environment. The question
is, does it have some sort of beneficial function where it is right
now in the particular organism in which is might be found? Nature
cannot select to keep a particular protein in a particular gene pool
just because it may have some sort of beneficial function elsewhere.
It really doesn't matter what its function may or may not be
elsewhere. Nature only looks at what works right now in a particular
creature. Nature cannot plan ahead or say to itself, "Hey, this
protein would work great if only it was in a different creature or a
different place in the genome." If it is not working right now where
it is, nature simply will not select to keep it.

The fact of the matter is that from the perspective of a particular
organism the vast majority of possible proteins do not have a
beneficial function. This ratio of beneficial vs. non-beneficial is
much higher for those functions that require relatively small proteins
(20 or 30aa), but it decreases in an exponential manner for each
additional fairly specified amino acid that is require for minimum
function of a particular type to be realized. After just a few
hundred fairly specified amino acids, the density of beneficial vs.
non-beneficial sequences becomes so miniscule that the mindless
processes of evolutionary change simply cannot find new sequences with
new types of functions very easily. At the level of just a few
thousand fairly specified amino acids evolutionary processes stall out
completely this side of trillions upon trillions of years. Examples
of functions that require such levels of specified complexity include
bacterial motility systems, such as the flagellar apparatus, which
requires at least 20 to 30 different types of proteins totaling well
over 5,000 amino acids working together at the same time for the
function of flagellar motility to be realized. Evolutionary processes
simply cannot evolve new types of functions at such levels of minimum
specified complexity in what anyone would call a reasonable amount of
time.

Sean
www.naturalselection.0catch.com

Chris Merli

unread,
Dec 22, 2003, 5:44:47 PM12/22/03
to

"Sean Pitman" <seanpi...@naturalselection.0catch.com> wrote in message
news:80d0c26f.03122...@posting.google.com...

I think this simply supports my point. You dismiss proteins as useless
because they are not functional in the current individual at the present
moment. There is no reason to suspect they were useless in a recent
ancestor under different conditions. In order to correctly identify
proteins that are useful you would have to identify all active sites (notice
whole proteins are not the issue simply active sites) that were useful not
only in a current organism but in all possible ancestors. A protein that
was useful in an ancient species and is not significantly altered b may
easily be recruited in a new evolutionary event.


>
> The fact of the matter is that from the perspective of a particular
> organism the vast majority of possible proteins do not have a
> beneficial function. This ratio of beneficial vs. non-beneficial is
> much higher for those functions that require relatively small proteins
> (20 or 30aa), but it decreases in an exponential manner for each
> additional fairly specified amino acid that is require for minimum
> function of a particular type to be realized. After just a few
> hundred fairly specified amino acids, the density of beneficial vs.
> non-beneficial sequences becomes so miniscule that the mindless
> processes of evolutionary change simply cannot find new sequences with
> new types of functions very easily. At the level of just a few
> thousand fairly specified amino acids evolutionary processes stall out
> completely this side of trillions upon trillions of years.

The trouble, as was pointed out elsewhere, is that proteins are not one
massive single function entity but a series of smaller active sites. To
make matter worse usually these active sites require only a small number of
specific amino acids.

Examples
> of functions that require such levels of specified complexity include
> bacterial motility systems, such as the flagellar apparatus, which
> requires at least 20 to 30 different types of proteins totaling well
> over 5,000 amino acids working together at the same time for the
> function of flagellar motility to be realized.

I am afraid only a small fraction of these 5,000 amino acids are really
critical to the function of the proteins. This seems to be the fundemental
misunderstanding that is driving the neutral gaps idea.

RobinGoodfellow

unread,
Dec 23, 2003, 1:29:00 AM12/23/03
to
seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.03122...@posting.google.com>...

This is correct, but it cuts both ways, as Chris already pointed out.
Just because something is not beneficial at time X under a certain set
of conditions does not mean it would not be beneficial - and thus
selected for - at time Y under a different set of conditions.

Take your lactase argument, for example. You claim that the lactase
function must be very rare because we have never seen lactase evolve
in the wild in "billions of bacterial generations". (I'd love to see
how you've arrived at this figure, by the way.) However, as others
have pointed out to you, the ability to process lactose just wouldn't
be all that beneficial to most bacteria, since an overwhelming
majority of them do not live in lactose-rich envoronments. Even for
those bacteria that find themselves in such environments, evolving a
rudimentary lactase function will not necessarily be beneficial, since
they will be competing with bacteria that are already highly efficient
at catalyzing lactose. That is, currently in the wild, there isn't
much of a niche for evolving the lactase function. When such a niche
does appear, as happenned in Hall's experiment, or in the wild with
the nylonase enzyme, evolution eventually will find a rudimentary
solution capable of filling that niche. If it is sufficiently useful
for the organism, this solution can then be refined, by incremental
mutations, to something better suited for performing the new function.
Such mutations are not reversible - since a reverse mutation will
result in worse performance and will be selected out. This is how
specificity of sequence to function evolves. The notion of a function
*requiring* a specific sequence is complete bollocks unless you can
demonstrate, clearly and unambigously, that the function can only be
carried out by a molecular system obeying some highly specific
physical and chemical constraints. For instance, in case of the
lactase function, you would need to demonstrate that only a handful of
protein folds are suitable for catalyzing lactose, and that the
proteins must be composed of some very specific amino acids to
maintain these rare folds. So far, you've done a fine job of
asserting that such functions exist, but provided no evidence that
they do. In fact, if our resident biochemist sweetness is to be
believed, there is no black art involved in hydrolising lactose, and a
rudimentary molecule capable of doing so should be easy to obtain.

> The fact of the matter is that from the perspective of a particular
> organism the vast majority of possible proteins do not have a
> beneficial function. This ratio of beneficial vs. non-beneficial is
> much higher for those functions that require relatively small proteins
> (20 or 30aa), but it decreases in an exponential manner for each
> additional fairly specified amino acid that is require for minimum
> function of a particular type to be realized.

Again, this is only valid once you show that only a certain (long)
sequence of amino acids is *absolutely required* for the function.
Which you haven't.

But let us dispense with the silly notion of "function" altogether,
shall we? After all, you and I both know that evolution is
non-teleological: that is, from the evolutionary perspective, the it
is no more the function of the flagellum to grant motion to bacteria
than it is the function of a tornado to spin in circles and toss cows.
Your claim is grander: you claim that evolution cannot produce any
system of a certain complexity, as measured by, say, the number of
amino intracting amino acids in the system. So, let me ask you this:
of all possible 5000 amino acid sequences, how on earth does you or
anyone determine which ones are and which ones aren't "beneficial",
especially given ever-changing contexts? After, all this is crucial
for your "ratio of beneficial sequences argument" (which is actually
irrelevant anyway, as I'll explain below).

> After just a few
> hundred fairly specified amino acids, the density of beneficial vs.
> non-beneficial sequences becomes so miniscule that the mindless
> processes of evolutionary change simply cannot find new sequences with
> new types of functions very easily.

Aha! You are equivalencing "density" and "ratio", which is simply
wrong. If one were to adapt your model of sequence space (and one
really shouldn't!), the ratio would tell you about the relative number
of "beneficial" sequences, but nothing about their *density* - i.e.
distribution throughout the sequence space. That is, the relative
number of sequences could be tiny, but they could be either all
bunched together (very dense), or be separate, equally spaced-out
specks in the space (very sparse), or could be bunched into clusters
that are connected with one another in any way imaginable, to be
consistent with just about any hypothesis about the expected time that
it would take a random walk to get from one cluster to another.
Again, a random walk over a frozen n-dimensional sequence space is a
terrible model for evolution, but even if it were right, your
"statistics" would still be meaningless for the above reason. In
fact, your probability calculation only applies if evolution worked by
sampling entire n-amino acid sequences at random from this space -
which is exactly why your critics accuse you of thinking that
evolution operates this way.

> At the level of just a few
> thousand fairly specified amino acids evolutionary processes stall out
> completely this side of trillions upon trillions of years. Examples
> of functions that require such levels of specified complexity include
> bacterial motility systems, such as the flagellar apparatus, which
> requires at least 20 to 30 different types of proteins totaling well
> over 5,000 amino acids working together at the same time for the
> function of flagellar motility to be realized.

Really? When people give you examples of large proteins evolving, you
are quick to point out that many of the amino acids in those proteins
are irrelevant, and the number of actual "specified" amino acids is
very small. So what on earth makes you think that all the amino
acids, in all the proteins involved in flagellar assembly are relevant
and invariate?

> Evolutionary processes
> simply cannot evolve new types of functions at such levels of minimum
> specified complexity in what anyone would call a reasonable amount of
> time.

I agree. A few hundred million years is definitely unreasonable,
especially if you want the whole process reproduced in the lab in the
blink of an eye.

> Sean
> www.naturalselection.0catch.com

Cheers,
Robin.

howard hershey

unread,
Dec 24, 2003, 2:57:30 PM12/24/03
to

Sean, to make a long story short, the ratio of beneficial versus
non-beneficial you give is utterly irrelevant to anything. The
denominator of your ratio is always *total sequence space* based on
taking 1/20 to the power of the total number of amino acids (or minimum
number, which, in those cases you want to be unevolvable, is always the
same as total number of amino acids). That denominator is utterly
irrelevant *unless* your model is that every new functional protein
arises by a random walk from a random protein sequence. The *real*
model of evolution assumes a quite different mechanism, the modification
of a pre-existing protein (or duplicate thereof) in a specific organism.

The sequence space that is *relevant* to evolutionary mechanism is the
sequence space encoded by the proteins in that organism. That is, the
only relevant question is if, in *this* non-random sequence space, there
is a sequence x number of changes away from a selectable functionality
required by an environmental change. The sequence space in any given
organism is most certainly not *even* a random sample of total sequence
space. Far from it. Almost every one of the proteins has some
functional utility already. That alone makes your "ratio", based as it
is on a denominator of *total sequence space*, GIGO.

> After just a few
> hundred fairly specified amino acids, the density of beneficial vs.
> non-beneficial sequences becomes so miniscule that the mindless
> processes of evolutionary change simply cannot find new sequences with
> new types of functions very easily. At the level of just a few
> thousand fairly specified amino acids evolutionary processes stall out
> completely this side of trillions upon trillions of years.

Only if one were ignorant enough to use a denominator of *total sequence
space* that requires evolution to search through total sequence space
for functional sequences. No model of evolution does that.

> Examples
> of functions that require such levels of specified complexity include
> bacterial motility systems, such as the flagellar apparatus, which
> requires at least 20 to 30 different types of proteins totaling well
> over 5,000 amino acids working together at the same time for the
> function of flagellar motility to be realized.

And Sean has not *even* justified the number of 5,000.

RobinGoodfellow

unread,
Dec 24, 2003, 6:36:18 PM12/24/03
to
howard hershey wrote:

[snip]


>
> Sean, to make a long story short, the ratio of beneficial versus
> non-beneficial you give is utterly irrelevant to anything. The
> denominator of your ratio is always *total sequence space* based on
> taking 1/20 to the power of the total number of amino acids (or minimum
> number, which, in those cases you want to be unevolvable, is always the
> same as total number of amino acids). That denominator is utterly
> irrelevant *unless* your model is that every new functional protein
> arises by a random walk from a random protein sequence. The *real*
> model of evolution assumes a quite different mechanism, the modification
> of a pre-existing protein (or duplicate thereof) in a specific organism.

It is even worse than that. Even random walks starting at random points
in N-dimensional space can, in theory, be used to sample the states
with a desired property X (such as Sean's "beneficial sequences"), even
if the number of such states is exponentially small compared to the
total state space size. Such random walks are at the heart of
Monte-Carlo methods, used to solve a wide variety of problems in
physics, statistics, computer science, etc. The the time requirements
for such a random walk would depend on the distribution of valid states
(i.e. "beneficial sequences") in the space, the transition probabilities
between each state, and, to a lesser extent, the starting point. Of
course, the size of each state (i.e. the dimension of the space) is also
a factor, but the key point (and what makes Monte-Carlo techniques so
useful) is that the relationship between time and state size need not be
exponential. Depending on the specific details described above, the
time requirement may be any function of state size - possibly even a
linear function! Again, I totally agree that a simple Monte-Carlo
process is a pitiful model for evolution - but Sean's statistics are way
off even in when applied to this model. His probabilities would only be
valid if evolution worked by repeatedly generating N-amino-acid
sequences de novo *every time*, with selection only keeping "beneficial"
sequences. That is, Sean's calculation reflects the probability of
finding a desired state with property X by blind, uniform random
sampling of N-dimensional space. The best thing that could be said
about such an attempt to model evolution is that it is laughable. But
it appears that Sean does not realize that he is making this mistake,
and keeps repeating his probabilities like a broken record.

[snip]

Sean Pitman

unread,
Dec 29, 2003, 7:13:57 PM12/29/03
to
howard hershey <hers...@indiana.edu> wrote in message news:<bscr3q$12r$1...@hood.uits.indiana.edu>...


> Sean, to make a long story short, the ratio of beneficial versus
> non-beneficial you give is utterly irrelevant to anything. The
> denominator of your ratio is always *total sequence space* based on
> taking 1/20 to the power of the total number of amino acids (or minimum
> number, which, in those cases you want to be unevolvable, is always the
> same as total number of amino acids).

Yes. I'm interested in the total number of amino acids required for a
particular type of function to be realized at its most minimum
beneficial level of function. This level is very different depending
on the type of function in question. Those types of new functions
that require more than a couple thousand fairly specified amino acids
working together at the same time simply do not evolve no matter what
you start with, functional or not.

> That denominator is utterly
> irrelevant *unless* your model is that every new functional protein
> arises by a random walk from a random protein sequence. The *real*
> model of evolution assumes a quite different mechanism, the modification
> of a pre-existing protein (or duplicate thereof) in a specific organism.

That's exactly the model that I'm talking about. Starting with
pre-exiting proteins having pre-existing individual and collective
beneficial functions, you will not see the evolution of any new type
of protein function that requires more than a couple thousand fairly
specified amino acids working together at the same time. You can use
duplication, point mutation, translocation, frame shifts, etc., and
they will all fail to get you a new type of function that goes very
far beyond the lowest levels of functional complexity toward any new
type of function. Simple up-regulation of what you already have will
only get you so far. Evolving new sequences with new functions just
doesn't happen beyond very low levels of functional complexity.

> The sequence space that is *relevant* to evolutionary mechanism is the
> sequence space encoded by the proteins in that organism.

Not so. The sequence space that is relevant to the evolution of a
particular gene pool is the sequence space that surrounds all possible
beneficial functions that could be used by that type of organism in
its current environment at various levels of functional complexity.
Remember, the sequence space changes exponentially depending upon the
level of complexity in question.

> That is, the
> only relevant question is if, in *this* non-random sequence space, there
> is a sequence x number of changes away from a selectable functionality
> required by an environmental change.

And the lower the level of functional complexity the more of such
functions there will be within a couple steps of what happens to
already be there in the gene pool. However, regardless of the
starting point, the average distance to those functions at higher and
higher levels of complexity increases exponentially. This means that
no matter what life form or gene pool you start with, it will be
limited in its evolutionary potential to only those functions that
require, at minimum, no more than a couple thousand fairly specified
amino acids working together at the same time. The reason for this is
that you just will never find an organism that just happens to be only
a few steps away from any beneficial function at such a level of
complexity even if you tried out zillions of organisms and gene pools
over zillions of years of time.

> The sequence space in any given
> organism is most certainly not *even* a random sample of total sequence
> space. Far from it. Almost every one of the proteins has some
> functional utility already. That alone makes your "ratio", based as it
> is on a denominator of *total sequence space*, GIGO.

Certainly the proteins that a creature starts with are all pretty much
beneficial - in good working order. I don't understand how you think
that I am arguing against this concept. I'm clearly not arguing
against this at all. In fact I use this concept as the basis for my
argument. You start with something that is fully functional. All the
proteins are working in a very beneficial way. Now, get something new
- a new type of function. That is the goal of evolution, new types of
beneficial functions. For awhile evolution does pretty good - at the
lowest levels of functional complexity. However, with each step up
the ladder of functional complexity (i.e., each additional fairly
specified minimum amino acid requirement), evolution does
exponentially worse and worse until it completely stalls out well
before the level of just a couple thousand fairly specified amino
acids are reached. New functions at such a level and beyond just do
not evolve in any creature no matter what their original starting
point was and no matter if trillions upon trillions of years are
provided.

Sean
www.naturalselection.0catch.com

howard hershey

unread,
Dec 30, 2003, 12:21:00 PM12/30/03
to

Sean Pitman wrote:
> howard hershey <hers...@indiana.edu> wrote in message news:<bscr3q$12r$1...@hood.uits.indiana.edu>...
>
>
>>Sean, to make a long story short, the ratio of beneficial versus
>>non-beneficial you give is utterly irrelevant to anything. The
>>denominator of your ratio is always *total sequence space* based on
>>taking 1/20 to the power of the total number of amino acids (or minimum
>>number, which, in those cases you want to be unevolvable, is always the
>>same as total number of amino acids).
>
>
> Yes.

"Yes." to what? To the fact that the 'minimum number of amino acids' is
*arbitrarily* determined to be the same as 'total number of amino acids'
(or insignificantly different from it) whenever you *arbitrarily* want
that protein system to be unevolvable? Or to the fact that you are
creating a denominator which pretends that evolution works by searching
through total sequence space?

> I'm interested in the total number of amino acids required for a
> particular type of function to be realized at its most minimum
> beneficial level of function. This level is very different depending
> on the type of function in question.

No. It is only a 'very different' level when you want a system to be
declared 'unevolvable'. In every system where you cannot dispute that a
new function evolved (by one or a few mutational steps), you arbitrarily
declare that, because this change did not require "thousands of amino
acids", it does not count as whatever you think evolution involves and
one does not make the same calculation with a denominator that uses the
total amino acid number.

> Those types of new functions
> that require more than a couple thousand fairly specified amino acids
> working together at the same time simply do not evolve no matter what
> you start with, functional or not.

How do you go about determining the difference between 'those types of
new functions that require more than a couple of thousand fairly
specified amino acids working together at the same time' and 'those
types of new functions' that don't? I cannot think of *any* system that
requires "a couple of thousand fairly specified amino acids working
together at the same time". Can you (and you actually need to *justify*
the claim)?

>>That denominator is utterly
>>irrelevant *unless* your model is that every new functional protein
>>arises by a random walk from a random protein sequence. The *real*
>>model of evolution assumes a quite different mechanism, the modification
>>of a pre-existing protein (or duplicate thereof) in a specific organism.
>
>
> That's exactly the model that I'm talking about.

I agree that the *first* model is the one you have been using.

> Starting with
> pre-exiting proteins having pre-existing individual and collective
> beneficial functions, you will not see the evolution of any new type
> of protein function that requires more than a couple thousand fairly
> specified amino acids working together at the same time.

Why would anyone expect to see this? I cannot think of any realistic
evolutionary model of any biological system that requires a couple of
thousand amino acid changes?

> You can use
> duplication, point mutation, translocation, frame shifts, etc., and
> they will all fail to get you a new type of function that goes very
> far beyond the lowest levels of functional complexity toward any new
> type of function.

All evolution is at the lowest levels of functional complexity. All
evolution involves duplication, point mutation, translocation, frame
shifts, etc.

> Simple up-regulation of what you already have will


> only get you so far. Evolving new sequences with new functions just
> doesn't happen beyond very low levels of functional complexity.

One simply does not evolve "new sequences" by starting with a protein
which is thousands of amino acids away from the end point before there
is any selectable function. Your hypothetical system simply does not
exist in nature.

>>The sequence space that is *relevant* to evolutionary mechanism is the
>>sequence space encoded by the proteins in that organism.
>
>
> Not so. The sequence space that is relevant to the evolution of a
> particular gene pool is the sequence space that surrounds all possible
> beneficial functions that could be used by that type of organism in
> its current environment at various levels of functional complexity.

Nope. The sequence space that is relevant to evolutionary mechanisms is
the sequence space immediately near (within a few mutational steps of)
the existing genome in that organism. Period. End of story.

> Remember, the sequence space changes exponentially depending upon the
> level of complexity in question.

Since you are unable to quantify "level of complexity", the point is moot.

>>That is, the
>>only relevant question is if, in *this* non-random sequence space, there
>>is a sequence x number of changes away from a selectable functionality
>>required by an environmental change.
>
>
> And the lower the level of functional complexity the more of such
> functions there will be within a couple steps of what happens to
> already be there in the gene pool. However, regardless of the
> starting point, the average distance to those functions at higher and
> higher levels of complexity increases exponentially.

The *only* way this would make sense is if your 'new' function were a
teleologically determined goal toward which all changes were made. That
is not how evolution works.

> This means that
> no matter what life form or gene pool you start with, it will be
> limited in its evolutionary potential to only those functions that
> require, at minimum, no more than a couple thousand fairly specified
> amino acids working together at the same time. The reason for this is
> that you just will never find an organism that just happens to be only
> a few steps away from any beneficial function at such a level of
> complexity even if you tried out zillions of organisms and gene pools
> over zillions of years of time.

You have not even convinced me that there *are* functions that require
"a couple of thousand fairly specified amino acids", all of which must
be 'just so' so that all of them work "together at the same time". You
have proposed that the bacterial flagella is such a system, but I have
yet to see how you calculated the 'minimum number of amino acids' needed
and what effect the existence of a TTSS as a precursor system would have
on that 'minimum number of amino acids', given that much of the proteins
perform the same secondary functions in both (e.g., bind to the same or
very similar other protein, export proteins, etc.).

>>The sequence space in any given
>>organism is most certainly not *even* a random sample of total sequence
>>space. Far from it. Almost every one of the proteins has some
>>functional utility already. That alone makes your "ratio", based as it
>>is on a denominator of *total sequence space*, GIGO.
>
>
> Certainly the proteins that a creature starts with are all pretty much
> beneficial - in good working order. I don't understand how you think
> that I am arguing against this concept.

I am pointing out that sequence space which is not a random sample of
total sequence space has no relationship to the denominator you use in
your calculations. That denominator is predicated upon the phoney,
irrelevant straw man idea that evolution proceeds by starting with a
sequence thousands of amino acids away from the particular teleologic
system you think must be the goal. And that getting to that teleologic
end point from the starting point of a random sequence involves a random
walk without any possibility of intermediate functionality. That is a
dishonest representation of evolution on several points: 1) That the
starting point is an effectively random sequence thousands of amino
acids away from the end point. 2) That there are no possible states of
intermediate utility between the random starting point and the end
point. 3) That the end point is a teleological end point.

> I'm clearly not arguing
> against this at all. In fact I use this concept as the basis for my
> argument. You start with something that is fully functional. All the
> proteins are working in a very beneficial way. Now, get something new
> - a new type of function. That is the goal of evolution, new types of
> beneficial functions. For awhile evolution does pretty good - at the
> lowest levels of functional complexity.

And it does so without needing to change 'thousands of amino acids'.
Part of the reason is because *no* single, *immediate* function of any
single protein's or protein complex's *active sites* involves thousands
of amino acids. Each active site on a protein or on protein complexes
involves only a few amino acids. Only a few amino acids on a protein
are involved in binding a substrate or another protein. Larger
'functions' involve a number of *independent* binding sites and active
sites, each of which can evolve *independently* of the other sites and
do so in a *stepwise* fashion. Because of the *independence* of these
sites, states of *intermediate* utility are possible. One can have an
intermediate state of functional utility that does not involve *all* the
proteins of the current flagella. And then one can change a different
protein that has an independent functionality so that it binds to the
complex of the proto-flagella that has a non-flagellar functional
utility by changing, not thousands, but only a few, amino acids.

> However, with each step up
> the ladder of functional complexity (i.e., each additional fairly
> specified minimum amino acid requirement), evolution does
> exponentially worse and worse until it completely stalls out well
> before the level of just a couple thousand fairly specified amino
> acids are reached. New functions at such a level and beyond just do
> not evolve in any creature no matter what their original starting
> point was and no matter if trillions upon trillions of years are
> provided.

We can both agree that they would not evolve *if* evolution worked the
way you claim it does. But that is a bogus straw man version of
evolution that has no relationship to the *real* model of evolution.
That is, your calculations (particularly the denominator) is an utterly
irrelevant GIGO nonsense.

Your straw man model of evolution is basically the molecular equivalent
of the idea of a lizard giving birth to a bird. That is, the idea that
a randomly chosen animal must be able to give birth to another
teleologically determined random functional animal without any
possibility of a functional intermediate. You can see a lizard giving
birth to a slightly modified lizard (say with extra claws). But you
just cannot see a lizard giving birth to a bird. And since we cannot
produce this in the lab in five years, that proves that evolution
doesn't work. GIGO and more GIGO.
>
> Sean
> www.naturalselection.0catch.com
>

Sean Pitman

unread,
Dec 30, 2003, 12:25:23 PM12/30/03
to
RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...


> It is even worse than that. Even random walks starting at random points
> in N-dimensional space can, in theory, be used to sample the states
> with a desired property X (such as Sean's "beneficial sequences"), even
> if the number of such states is exponentially small compared to the
> total state space size.

This depends upon just how exponentially small the number of
beneficial states is relative to the state space. It also depends
upon how fast this space is searched through. For example, if the
ratio of beneficial states to non-beneficial states is as high as say,
1 in a 1e12, and if 1e9 states are searched each second, how long with
it take, on average, to find a new beneficial state? It will take
just over 1,000 seconds - a bit less than 20 minutes on average. But,
what happens if at higher levels of functional complexity the density
of beneficial functions decreases exponentially with each step up the
ladder? The rate of search stays the same, but the junk sequences
increase exponentially and so the time required to find the rarer and
rarer beneficial states also increases exponentially.

> Such random walks are at the heart of
> Monte-Carlo methods, used to solve a wide variety of problems in

> physics, statistics, computer science, etc. The time requirements

> for such a random walk would depend on the distribution of valid states
> (i.e. "beneficial sequences") in the space, the transition probabilities
> between each state, and, to a lesser extent, the starting point.

Exactly. And the "beneficial sequences" (i.e., the density of
beneficial sequences) is inversely related, in an exponential manner,
to the level of minimum informational complexity required for these
functions to work at a minimum level of beneficial function.

> Of
> course, the size of each state (i.e. the dimension of the space) is also
> a factor, but the key point (and what makes Monte-Carlo techniques so
> useful) is that the relationship between time and state size need not be
> exponential. Depending on the specific details described above, the
> time requirement may be any function of state size - possibly even a
> linear function!

No, go and check these formulas again and then show me how they are
"linear" with increasing minimum state sizes. They are not linear,
but exponential relationships. However, even if they actually were
linear as you suggest, this would still pose a significant problem to
evolution beyond a certain point of informational complexity. Even a
linear decrease in density with increasing minimum space size would
result in a linear increase in required time to find new functions at
that level of complexity.

> Again, I totally agree that a simple Monte-Carlo
> process is a pitiful model for evolution - but Sean's statistics are way
> off even in when applied to this model.

I fail to see how you have supported this statement of yours. My
statistics do seem to match not only the exponentially increasing
ratios found in language systems like English and information systems
like functional proteins and genes, but they also match statistical
programs used in computer software development and the like. This is
why computers cannot evolve their own software programs beyond the
lowest levels of functional complexity. To go very far beyond the
informational complexity that they already have they require the
intelligence and creativity of human programmers to get across these
vast neutral gaps that simply cannot be searched out in any sort of
reasonable amount of time by mindless processes. So, please do show me
how your Monte-Carlo technique can search an increasing state space
and find beneficial states in a linear fashion with each increase in
the minimum informational complexity requirement.

> His probabilities would only be
> valid if evolution worked by repeatedly generating N-amino-acid
> sequences de novo *every time*, with selection only keeping "beneficial"
> sequences. That is, Sean's calculation reflects the probability of
> finding a desired state with property X by blind, uniform random
> sampling of N-dimensional space. The best thing that could be said
> about such an attempt to model evolution is that it is laughable.

How is this laughable when you evolutionists can't seem to come up
with any other way to explain now new types of functions that require
at least a couple thousand fairly specified amino acids working
together at the same time can evolve? What method do you propose to
explain such levels of functional diversity within living things? How
do you get from one type of function at such a level to another type
of function within this same level of specified complexity?

> But
> it appears that Sean does not realize that he is making this mistake,
> and keeps repeating his probabilities like a broken record.

And you guys keep repeating your non-supported assertions like a
mantra. You keep saying I'm crazy and that my ideas are laughable,
but you have presented nothing to significantly counter my position.
My hypothesis remains untouched and my predictions still hold. What
have you presented besides a bunch of non-supported "just-so" and
"trust me" statements? Where is your falsifiable evidence? The best
that I can see is that you guys keep falling back on the philosophical
position that given enough time anything is possible via the
extraordinary creativity of The Mindless - even beyond the most
miraculous creations of mankind.

Basically evolution explains everything - even without demonstration -
and therefore nothing. It is a weak historical hypothesis at best.
It is not falsifiable by any sort of real time genetic experiment -
such as a Pasteur-like experiment. Every time a prediction fails, you
evolutionists just fall back on your philosophy and say, "Oh well, I
guess that particular level of evolution just requires millions of
years - but it certainly happened within 4 billion years that's for
sure!" Really, there is no way to falsify such a philosophical
position. Statistically you have nothing. Statistically it is very
clear that evolution, as an explanation for the variety and levels of
functional complexity that we find in all living things, is simply
untenable. Of course, you are free to hold whatever philosophical
position that you want, but if you hope to convince those who actually
wish to consider the statistical problems involved, you will have to
do much better than you have done so far to hold onto your illusions
of "scientific superiority".

Sean
www.naturalselection.0catch.com

Howard Hershey

unread,
Dec 31, 2003, 3:29:00 PM12/31/03
to

Sean Pitman wrote:
>
> RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...

Talking with Sean *really* is like an absurdist play written by Genet.
Let's call it "Evidence for Godot."

Waiting bum #1: See that cow at the top of the hill. God lifted him up there.

Waitng bum #2: Huh? What makes you think that there is no natural
mechanism that would allow a cow to reach the top of the hill?

Bum #1: That hill must be a 'thousand' feet high. There is no way that
a cow can jump a thousand feet from the valley to the top of the
mountain. Cows simply do not have the muscle power to do that. I can
prove it. Therefore it is impossible for a cow to reach the top by
randomly jumping up with no possible places of intermediate resting. A
cow simply cannot jump up a thousand times and rest in thin air to make
the next jump. Thus, this proves that it is impossible for a cow to
reach the top of the mountain.

Bum #2: What makes you think that the mechanism involved jumping
directly from the valley to the top of the mountain with no possible
intermediate stopping points?

Bum #1: The cow would have to leap a 'thousand' feet high to reach the
top of that mountain. The taller the mountain, the higher the cow has
to jump. I can maybe see a cow jumping 480 feet, but a thousand feet is
exponentially more difficult. And 2,000 feet is impossible. Nope.
There is no way that a cow can jump even a thousand feet in a single
bound with no intermediate resting places. And the fact that you cannot
produce any evidence of a cow jumping a thousand feet in the air in 30
seconds is evidence that it is impossible for a cow to reach the top of
the mountain unless God gives it a lift.

Bum #2: Well, I agree. The *mechanism* you propose *would* be
impossible. But what is to prevent a cow reaching the top of the
mountain from a place one (or even two) foot below the top? What is to
prevent the cow from walking (even randomly) up the hill in a stepwise
fashion? There seem to be a number of resting places on the side of the hill.

Bum#1: Well, I don't see any cow one foot below the top, do you?
Therefore it is impossible for a cow to ever have been one foot below
the top. And the cow would have had to jump from the valley to that
spot just below the top in any case. What evidence do you have that
there are intermediate places where cows have been?

Bum #2: There is a potential pathway up the hill that I can see. And
there are cattle footprints at some of the intermediate sites that look
just like the footprints of the cow at the top.

Bum #1: You keep waving these hypothetical pathways and hoofprint
similarities as if they were actual evidence that the cow at the top
passed that way. What you need to do to show that a cow can reach the
top of the mountain naturally is to have that cow down there jump up to
the top in 30 seconds with no intermediate resting places. Now that
*would* be proof that a natural mechanism is possible.

Bum #2: That's silly. I am explicitly rejecting your mechanism in
favor of a quite different, but still natural, one.

Bum #1: But you still haven't demonstrated that a cow can jump from the
bottom of the hill to the top in a single bound. Therefore God must
have lifted the cow from the valley to the hilltop.

Bum #2: Well, if my idea is true, I would expect to see some evidence
that a cow has passed this way, some evidence that a cow was at
intermediate steps along the pathway that I can see. Perhaps cow
patties. Even a pile of shit would be more evidence than you are
presenting for your Goddidit alternative. And I certainly agree that
your 'natural' alternative mechanism of a single leap is a pile of shit
and unlikely.

Bum#1: I am quite satisfied that at heights of 480 feet, a cow can jump
to the top of the mountain, but that the process gets much more
difficult when the height is 1,000 feet and completely impossible at
2,000. The difficulty of the jump increases exponentially with height.

Bum #2: I don't even think 480 feet is doable by the 'natural'
mechanism you propose, which is merely God lifting the animal from
valley to mountaintop, but without the God. I think an entirely
different mechanism (stepwise walking upslope) was involved. Moreover,
I bet there is some evidence (e.g., even a simple pile of shit would be
more than you present) that the stepwise mechanism is the correct one.
I think I will walk up the mountain myself and see if there is such evidence.

Bum #1: You won't find any because the cow had to get up the mountain
in one giant leap for cowkind. I will wait here for Godot to lift a cow
up again.

Exit Bum #2. AFAK, Bum #1 is still waiting for Godot to show him a miracle.


>
> > It is even worse than that. Even random walks starting at random points
> > in N-dimensional space can, in theory, be used to sample the states
> > with a desired property X (such as Sean's "beneficial sequences"), even
> > if the number of such states is exponentially small compared to the
> > total state space size.
>
> This depends upon just how exponentially small the number of
> beneficial states is relative to the state space.

Your argument is also completely irrelevant. All calculations based on
such a denominator are GIGO and irrelevant to any real evolutionary
mechanisms. The only time such a calculation would have any value at all
would be wrt the intitial steps in abiogenesis. At that point, the
frequency at which specific relevant 'functions' like polynucleotide
kinase activity or RNA ligase activity occur in randomly generated RNAs
of, say, 50 nt lengths, *would* be relevant. The *experimental*
evidence indicates that such activities (not optimal, but in the kingdom
of the blind...) occur with a frequency of 1/10^16 or 1/10^17 molecules.
That means that *all* of these functions/activities would be present in
less than a micromole (a mole is Avagadro's number or about 10^23
molecules) of such randomly generated RNA molecules.

> It also depends
> upon how fast this space is searched through. For example, if the
> ratio of beneficial states to non-beneficial states is as high as say,
> 1 in a 1e12, and if 1e9 states are searched each second, how long with
> it take, on average, to find a new beneficial state?

Utterly irrelevant unless one assumes that evolution works by starting
with a random sequence or random unrelated protein and engages in a
completely random walk to the one function. No intelligent person
thinks that is how evolution works.

[snip more stuff that is utterly devoid of any possible relevance to evolution.]



> And you guys keep repeating your non-supported assertions like a
> mantra. You keep saying I'm crazy and that my ideas are laughable,
> but you have presented nothing to significantly counter my position.
> My hypothesis remains untouched and my predictions still hold.

Since your hypothesis is "*IF* evolution works by starting with a random
sequence or a random unrelated protein and proceeds by a long random
walk with no intermediates state of utility, evolution will not happen."
is indeed untouched and predictions based on those assumptions will
hold. The problem is that that hypothesis is utterly irrelevant to the
way evolution *does* work and is belied by the very structure of genomes
and the relationships between protein sequences which indicates a
different, stepwise mechanism with intermediate stages of functional utility.

> What
> have you presented besides a bunch of non-supported "just-so" and
> "trust me" statements? Where is your falsifiable evidence?

Where is yours? You need to demonstrate that some system *must* evolve
by the mechanism you propose. That means you need to find a system
where no possible intermediate of utility can exist, find a system which
you know, for a fact, *must have* started with a sequence completely
unrelated to the current sequence and in which only the end result has
selectable activity of *any* sort, and you need to find, demonstrate,
and quantitate how you can determine a 'minimum amino acid number'. We
keep pointing out that the problem with your numbers is that it is
predicated on a false straw man version of evolution. And we keep
pointing out that one does not need to *change* thousands of amino acids
to convert, say, a TTSS to add a crude motility function. Just as one
does not need to *change* thousands of (nor even 480) amino acids to
convert ebg into a protein that has the lactase activity it did not
previously have. Nor does one need to *change* thousands of amino acids
to generate a 'new' motility function in the swarming bacteria. All you
have presented is vague ideas about 'thousands of fairly specified amino
acids' and random walks from random sequences that are utterly
irrelevant.

> The best
> that I can see is that you guys keep falling back on the philosophical
> position that given enough time anything is possible via the
> extraordinary creativity of The Mindless - even beyond the most
> miraculous creations of mankind.

No. We fall back on the real mechanisms of evolution, which bear no
relationship to the ideas included in your calculations. Ideas about
modification of existing proteins of relevance, like modification of
substrate binding in ebg without modification of the active site. Like
duplication and divergence that produced both the specialized (and
without lactase function) ebg and the specialized (and without ebg
function lacYZA operon). Like internal duplication (as in the human
lactase). Like multiple functionality (like beta-galactosidase). Like
chimera formation and independent utility of different motifs in a
protein (as in the human lactase).


>
> Basically evolution explains everything - even without demonstration -
> and therefore nothing. It is a weak historical hypothesis at best.

Historical hypotheses falsifiably *predicts* what is observed (common
descent in sequences, evolution in families, etc.). Intelligent design
predicts whatever one wants it to predict and thus predicts nothing.

> It is not falsifiable by any sort of real time genetic experiment -
> such as a Pasteur-like experiment.

Events that cannot be demonstrated by real time experiments (which
include planet formation and stellar formation and geological layer
formation) are, nontheless, falsifiable. They are falsifiable by virtue
of making specific predictions which can be tested against observation.

> Every time a prediction fails, you
> evolutionists just fall back on your philosophy and say, "Oh well, I
> guess that particular level of evolution just requires millions of
> years - but it certainly happened within 4 billion years that's for
> sure!"

Rates are important. So are observations. The evidence indicates that
ebg and lac operons (and other members of the family 2 glycoside
hydrolases) are not independent events with each happening by random
chance starting from a random sequence. The evidence indicates that
many of the proteins in bacterial flagella are not independent of
related proteins with independent functionality; that is, they did not
arise from a random sequence. All this evidence specifically tells us
that evolution does NOT involve starting with a random sequence and does
NOT proceed by a random walk though useless sequence space. That is, it
tells us that your argument is against a straw man.

> Really, there is no way to falsify such a philosophical
> position. Statistically you have nothing. Statistically it is very
> clear that evolution, as an explanation for the variety and levels of
> functional complexity that we find in all living things, is simply
> untenable.

No. It tells us that a straw man evolution that works by starting each
protein as a random sequence and having it go to a teleologic end point
via a random walk is untenable. Since that idea is unrelated to any
real evolutionary mechanism, it is irrelevant and the calculations based
upon these assumptions are mere GIGO.

> Of course, you are free to hold whatever philosophical
> position that you want, but if you hope to convince those who actually
> wish to consider the statistical problems involved,

But we are considering the statistical problems you pose and continually
point out that they are irrelevant GIGO based on a straw man idea of
evolution. They don't *even* have relevance to abiogenesis (which did
not involve proteins encoded in DNA). They are utterly without any
redeeming value when applied to recently evolved systems like 'lactase'
or 'motility' or 'heme-binding electron transfer'.

> you will have to
> do much better than you have done so far to hold onto your illusions
> of "scientific superiority".

You have to do much better to present any sort of scientific idea at all.
>
> Sean
> www.naturalselection.0catch.com

Sean Pitman

unread,
Dec 31, 2003, 7:18:49 PM12/31/03
to
howard hershey <hers...@indiana.edu> wrote in message news:<bssc8f$5sb$1...@hood.uits.indiana.edu>...

> Sean Pitman wrote:
> > howard hershey <hers...@indiana.edu> wrote in message news:<bscr3q$12r$1...@hood.uits.indiana.edu>...
> >
> >>Sean, to make a long story short, the ratio of beneficial versus
> >>non-beneficial you give is utterly irrelevant to anything. The
> >>denominator of your ratio is always *total sequence space* based on
> >>taking 1/20 to the power of the total number of amino acids (or minimum
> >>number, which, in those cases you want to be unevolvable, is always the
> >>same as total number of amino acids).
> >
> >
> > Yes.
>
> "Yes." to what?

Perhaps if you read the very next sentence before writing a whole
non-pertinent paragraph you would have your answer . . .

<snip>

> > I'm interested in the total number of amino acids required for a
> > particular type of function to be realized at its most minimum
> > beneficial level of function. This level is very different depending
> > on the type of function in question.
>
> No. It is only a 'very different' level when you want a system to be
> declared 'unevolvable'.

Not at all. Take for example the cytochrome c function. At minimum
this type of function appears to require at least 80 or so amino acids
in a fairly specified order. This is not so for the lactase function.
Can functional lactase enzyme to work in a beneficial manner in any
life form with only 80 amino acids? I don't think so. It seem like
the lactase function requires more than 400 fairly specified amino
acids (though less specified than in the cytochrome c function) before
this type of function can be realized. Now, can you get the flagellar
motility type of function with only 400-coded amino acid positions
working together at the same time? I don't think so. This type of
function seems to require at least 4,000 to 6,000 fairly specified
amino acids working together at the same time in order for the bare
minimum level of beneficial function of this type to be realized.

So you see, different types of cellular functions do indeed required
different numbers of amino acids at minimum as well as different
degrees of minimum amino acid specificity. You simply cannot deny
this obvious fact. You can try to cover it up with a lot of hand
waving and smoke blowing, as you have tried so valiantly to do, but I
don't think you have a very easy job since this idea is so obviously
true. It is difficult to cover up and dismiss something so clear and
obvious as this. Your efforts to do so only make it clearer.

> In every system where you cannot dispute that a
> new function evolved (by one or a few mutational steps), you arbitrarily
> declare that, because this change did not require "thousands of amino
> acids", it does not count as whatever you think evolution involves and
> one does not make the same calculation with a denominator that uses the
> total amino acid number.

I haven't made any sort of arbitrary declaration. As it turns out,
the only examples that you evolutionists have come up with as
"real-time" examples of evolution in action have not required anything
more than a few hundred loosely specified amino acids working together
at the same time. None of your examples of functions using thousands
of amino acids at the same time actually require that all of these
amino acids be there for minimum beneficial function of that type to
be realized. It is my hypothesized position that the reason why your
examples were only one or two mutational steps away from success is
because the relative density of beneficial sequences at such low
levels of functional complexity are rather dense. However, when you
start talking about systems of function that require, at minimum, a
few thousand fairly specified amino acids working together at the same
time, you simply run out of examples because there just aren't any. I
know this must be frustrating for you, but that is the cold hard fact
of the matter. You evolutionists just don't have anything that goes
very far beyond the lowest levels of functional complexity.

> > Those types of new functions
> > that require more than a couple thousand fairly specified amino acids
> > working together at the same time simply do not evolve no matter what
> > you start with, functional or not.
>
> How do you go about determining the difference between 'those types of
> new functions that require more than a couple of thousand fairly
> specified amino acids working together at the same time' and 'those
> types of new functions' that don't? I cannot think of *any* system that
> requires "a couple of thousand fairly specified amino acids working
> together at the same time". Can you (and you actually need to *justify*
> the claim)?

Yes I can and I have presented such examples over and over again -
such as the flagellar system of bacterial motility. This type of
function requires, at minimum, at least 20 different kinds of proteins
working together at the same time. Each of these proteins is composed
of around 300 fairly specified amino acids on average. This works out
to around 6,000aa collective, novel, fairly specified amino acid
positions working together at the same time for this time of function
to be realized at a minimum level of beneficial selectability.

> >>That denominator is utterly
> >>irrelevant *unless* your model is that every new functional protein
> >>arises by a random walk from a random protein sequence. The *real*
> >>model of evolution assumes a quite different mechanism, the modification
> >>of a pre-existing protein (or duplicate thereof) in a specific organism.
> >
> >
> > That's exactly the model that I'm talking about.
>
> I agree that the *first* model is the one you have been using.

No - I'm taking your *second* "true model" of evolution as "true". I
agree that the real model of evolution assumes the modification of a


pre-existing protein (or duplicate thereof) in a specific organism.

That is the model that I'm talking about. That is the model that
cannot evolve very far beyond the lowest levels of complexity from
what it started with. I'm using your model Howard - your "real" model
of evolution. I haven't made up a new model at all. What I am saying
is that your "real" model doesn't work like you think it does.

> > Starting with
> > pre-exiting proteins having pre-existing individual and collective
> > beneficial functions, you will not see the evolution of any new type
> > of protein function that requires more than a couple thousand fairly
> > specified amino acids working together at the same time.
>
> Why would anyone expect to see this? I cannot think of any realistic
> evolutionary model of any biological system that requires a couple of
> thousand amino acid changes?

I'm hitting my head against a brick wall here! Come on man! Try and
understand what I'm saying. The system that requires a couple
thousand amino acids at minimum most likely does not require a couple
thousand amino acid changes, starting with something that is already
there, to be realized. However, on average, such a system would
probably require several hundred fairly specified amino acids position
changes to what is already there. Then, a system that requires 10,000
fairly specified amino acids at minimum would probably require, on
average, perhaps as many as 1,000 fairly specified amino acid position
changes. If just 10% of these were neutral changes, on average, a
large colony numbering in the trillions would still take trillions
upon trillions upon trillions of years to evolve even one new type of
function at such a level of minimum informational complexity
(complexity = minimum sequence size plus minimum sequence
specificity).

> > You can use
> > duplication, point mutation, translocation, frame shifts, etc., and
> > they will all fail to get you a new type of function that goes very
> > far beyond the lowest levels of functional complexity toward any new
> > type of function.
>
> All evolution is at the lowest levels of functional complexity.

I couldn't have said it better myself . . .

> All
> evolution involves duplication, point mutation, translocation, frame
> shifts, etc.

Exactly . . .

> > Simple up-regulation of what you already have will
> > only get you so far. Evolving new sequences with new functions just
> > doesn't happen beyond very low levels of functional complexity.
>
> One simply does not evolve "new sequences" by starting with a protein
> which is thousands of amino acids away from the end point before there
> is any selectable function. Your hypothetical system simply does not
> exist in nature.

Your wrong. Such systems do exist in nature and in every living
thing. The average distance to simple functions requiring just 100 or
so loosely specified amino acids sequences may only be 3 or 4 neutral
amino acid changes wide. However, those types of functions that
require a minimum sequence of 1,000aa are separated by much wider
neutral gaps from everything that a given cell has by an average of
say, 30 or 40 neutral positional changes (i.e., representing an
average neutral gap of sequence space of over 1e50 sequences). Then,
when you get up to those functions that require several thousand
fairly specified amino acids at minimum, the average gap may grow to
500 or so neutral changes on average (sequence space of 1e650). Are
you starting to see the problem here?

> >>The sequence space that is *relevant* to evolutionary mechanism is the
> >>sequence space encoded by the proteins in that organism.
> >
> >
> > Not so. The sequence space that is relevant to the evolution of a
> > particular gene pool is the sequence space that surrounds all possible
> > beneficial functions that could be used by that type of organism in
> > its current environment at various levels of functional complexity.
>
> Nope. The sequence space that is relevant to evolutionary mechanisms is
> the sequence space immediately near (within a few mutational steps of)
> the existing genome in that organism. Period. End of story.

That would be great if it were true. The fact is that on average, as
you move up the ladder of functional complexity, there are
exponentially fewer and fewer starting points that are anywhere near
any other type of beneficial functional sequence at that level of
functional complexity. There just aren't any sequences within the
gene pool that are only one or two steps away from a new type of
function at such levels of complexity. That is why you don't have any
examples of real time evolution at such levels of complexity. It just
doesn't happen. Period. End of Story.

> > Remember, the sequence space changes exponentially depending upon the
> > level of complexity in question.
>
> Since you are unable to quantify "level of complexity", the point is moot.
>
> >>That is, the
> >>only relevant question is if, in *this* non-random sequence space, there
> >>is a sequence x number of changes away from a selectable functionality
> >>required by an environmental change.
> >
> >
> > And the lower the level of functional complexity the more of such
> > functions there will be within a couple steps of what happens to
> > already be there in the gene pool. However, regardless of the
> > starting point, the average distance to those functions at higher and
> > higher levels of complexity increases exponentially.
>
> The *only* way this would make sense is if your 'new' function were a
> teleologically determined goal toward which all changes were made. That
> is not how evolution works.

Not at all. I'm not talking about any one particular type of
function, but about all types of functions within a given level of
complexity. No new type of function within that level of higher
complexity (requiring a few thousand fairly specified amino acids at
minimum), will be able to evolve given what a genome has to start
with. This is because, on average, what a given genome has to start
with will be hundreds and even thousands of neutral fairly specified
amino acid positional changes away from all other types of functions
within that level of complexity.

Anyway, this is all I have time for today . . .

Sean
www.naturalselection.0catch.com

Howard Hershey

unread,
Dec 31, 2003, 8:22:16 PM12/31/03
to

Sean Pitman wrote:
>
> howard hershey <hers...@indiana.edu> wrote in message news:<bssc8f$5sb$1...@hood.uits.indiana.edu>...
> > Sean Pitman wrote:
> > > howard hershey <hers...@indiana.edu> wrote in message news:<bscr3q$12r$1...@hood.uits.indiana.edu>...
> > >
> > >>Sean, to make a long story short, the ratio of beneficial versus
> > >>non-beneficial you give is utterly irrelevant to anything. The
> > >>denominator of your ratio is always *total sequence space* based on
> > >>taking 1/20 to the power of the total number of amino acids (or minimum
> > >>number, which, in those cases you want to be unevolvable, is always the
> > >>same as total number of amino acids).
> > >
> > >
> > > Yes.
> >
> > "Yes." to what?
>
> Perhaps if you read the very next sentence before writing a whole
> non-pertinent paragraph you would have your answer . . .
>
> <snip>
> > > I'm interested in the total number of amino acids required for a
> > > particular type of function to be realized at its most minimum
> > > beneficial level of function. This level is very different depending
> > > on the type of function in question.
> >
> > No. It is only a 'very different' level when you want a system to be
> > declared 'unevolvable'.
>
> Not at all. Take for example the cytochrome c function. At minimum
> this type of function appears to require at least 80 or so amino acids
> in a fairly specified order.

I have asked this repeatedly, so now I will shout. HOW THE BLOODY HELL
DO YOU DEFINE WHAT IS OR IS NOT 'FAIRLY SPECIFIED ORDER' AND WHAT VALUE
DO YOU GIVE IT?

> This is not so for the lactase function.
> Can functional lactase enzyme to work in a beneficial manner in any
> life form with only 80 amino acids?

You keep confounding total number of amino acids with number of amino
acids needed to perform a function.

> I don't think so. It seem like
> the lactase function requires more than 400 fairly specified amino
> acids (though less specified than in the cytochrome c function) before
> this type of function can be realized.

So, shouting again, HOW THE BLOODY HELL DID YOU CALCULATE A NUMBER OF
400 AMINO ACIDS? I STRONGLY SUSPECT THAT NUMBER IS NOTHING BUT A WAG
(WILD-ASSED GUESS).

> Now, can you get the flagellar
> motility type of function with only 400-coded amino acid positions
> working together at the same time? I don't think so. This type of
> function seems to require at least 4,000 to 6,000 fairly specified
> amino acids working together at the same time in order for the bare
> minimum level of beneficial function of this type to be realized.

AND AGAIN, HOW THE BLOODY HELL DID YOU CALCULATE THAT NUMBER? I
STRONGLY SUSPECT THAT NUMBER IS NOTHING BUT A WAG. Convince me otherwise.


>
> So you see, different types of cellular functions do indeed required
> different numbers of amino acids at minimum as well as different
> degrees of minimum amino acid specificity.

How the bloody hell can I tell. You simply toss these numbers out
without giving any justification for using them. You don't even tell me
how to determine the number of 'minimal amino acids' needed for
function. You don't even tell me what you mean by function. Is the
part of the human lactase that goes through the membrane performing a
different 'function' than the part that is actively cleaving the
substrate? Is the part that *binds* the substrate performing a
different function than the part that is actively acting as the
nucleophil? If I can change one without affecting the other, is that a
change in function?

> You simply cannot deny
> this obvious fact. You can try to cover it up with a lot of hand
> waving and smoke blowing, as you have tried so valiantly to do, but I
> don't think you have a very easy job since this idea is so obviously
> true. It is difficult to cover up and dismiss something so clear and
> obvious as this. Your efforts to do so only make it clearer.

You, dear boy, are doing the hand-waving here. You repeatedly toss out
numbers and words without making any attempt to operationally define
them or justify the resulting numbers you get.


>
> > In every system where you cannot dispute that a
> > new function evolved (by one or a few mutational steps), you arbitrarily
> > declare that, because this change did not require "thousands of amino
> > acids", it does not count as whatever you think evolution involves and
> > one does not make the same calculation with a denominator that uses the
> > total amino acid number.
>
> I haven't made any sort of arbitrary declaration. As it turns out,
> the only examples that you evolutionists have come up with as
> "real-time" examples of evolution in action have not required anything
> more than a few hundred loosely specified amino acids working together
> at the same time.

No. All examples of evolution in action involve the modification of
pre-existing systems. That is what descent with (and by) modification
means. It doesn't matter how many amino acids the pre-existing system has.

> None of your examples of functions using thousands
> of amino acids at the same time actually require that all of these
> amino acids be there for minimum beneficial function of that type to
> be realized.

Well, it is damn hard to find the systems you want amidst all the other
systems that evolve just fine. But systems that are thousands of amino
acids long (that is the only operational definition I see you using) can
be modified by a few mutational changes just as easily as systems less
than 100 amino acids long.

> It is my hypothesized position that the reason why your
> examples were only one or two mutational steps away from success is
> because the relative density of beneficial sequences at such low
> levels of functional complexity are rather dense.

No. It is because our systems did not evolve via the mechanism you
claim must exist -- starting with a random sequence and proceding via a
random walk. They started with an 'ancestral' sequence which was then
'modified' by a few simple mutations. The number of amino acids in the
'ancestral' or 'final' systems is completely irrelevant to that
mechanism. The reason the relative density of beneficial sequences is
dense is because of the non-random nature of the starting point. Period.

> However, when you
> start talking about systems of function that require, at minimum, a
> few thousand fairly specified amino acids working together at the same
> time, you simply run out of examples because there just aren't any.

There are damn few such systems at all. *Even* if you count every
single amino acid in the proteins as being 'fairly specified' (whatever
that means).

> I
> know this must be frustrating for you, but that is the cold hard fact
> of the matter. You evolutionists just don't have anything that goes
> very far beyond the lowest levels of functional complexity.

I have no idea how you *quantify* the level of functional complexity,
given that most of the active sites act independently of each other.


>
> > > Those types of new functions
> > > that require more than a couple thousand fairly specified amino acids
> > > working together at the same time simply do not evolve no matter what
> > > you start with, functional or not.
> >
> > How do you go about determining the difference between 'those types of
> > new functions that require more than a couple of thousand fairly
> > specified amino acids working together at the same time' and 'those
> > types of new functions' that don't? I cannot think of *any* system that
> > requires "a couple of thousand fairly specified amino acids working
> > together at the same time". Can you (and you actually need to *justify*
> > the claim)?
>
> Yes I can and I have presented such examples over and over again -

And over and over again you have failed to do anything beyond
hand-waving arguments tossing out total amino acid numbers as
justification for this. How *do* you determine what counts as a 'fairly
specified amino acid'? How *do* you determine that the amino acids that
bind one protein to another are 'working together at the same time'
rather than just independently doing their thing? By nothing but empty
verbiage and bogus hand-waving numbers.

> such as the flagellar system of bacterial motility. This type of
> function requires, at minimum, at least 20 different kinds of proteins
> working together at the same time. Each of these proteins is composed
> of around 300 fairly specified amino acids on average. This works out
> to around 6,000aa collective, novel, fairly specified amino acid
> positions working together at the same time for this time of function
> to be realized at a minimum level of beneficial selectability.
>
> > >>That denominator is utterly
> > >>irrelevant *unless* your model is that every new functional protein
> > >>arises by a random walk from a random protein sequence. The *real*
> > >>model of evolution assumes a quite different mechanism, the modification
> > >>of a pre-existing protein (or duplicate thereof) in a specific organism.
> > >
> > >
> > > That's exactly the model that I'm talking about.
> >
> > I agree that the *first* model is the one you have been using.
>
> No - I'm taking your *second* "true model" of evolution as "true".

So you say. But that is at variance with reality.

> I
> agree that the real model of evolution assumes the modification of a
> pre-existing protein (or duplicate thereof) in a specific organism.
> That is the model that I'm talking about. That is the model that
> cannot evolve very far beyond the lowest levels of complexity from
> what it started with. I'm using your model Howard - your "real" model
> of evolution. I haven't made up a new model at all. What I am saying
> is that your "real" model doesn't work like you think it does.

The above 'words' are at total variance with the truth. The model you
are proposing *is* the first one, the one that involves a random walk
from a random sequence. That is the truth of what your math is saying.
It is your words that are not telling the truth. I don't know whether
that is from your ignorance wrt what your math is saying or your
ignorance wrt to what your words are saying. But there is a discrepancy.

> > > Starting with
> > > pre-exiting proteins having pre-existing individual and collective
> > > beneficial functions, you will not see the evolution of any new type
> > > of protein function that requires more than a couple thousand fairly
> > > specified amino acids working together at the same time.
> >
> > Why would anyone expect to see this? I cannot think of any realistic
> > evolutionary model of any biological system that requires a couple of
> > thousand amino acid changes?
>
> I'm hitting my head against a brick wall here! Come on man! Try and
> understand what I'm saying. The system that requires a couple
> thousand amino acids at minimum most likely does not require a couple
> thousand amino acid changes, starting with something that is already
> there, to be realized. However, on average, such a system would
> probably require several hundred fairly specified amino acids position
> changes to what is already there.

Why do you think that. That is utterly stupid.

> Then, a system that requires 10,000
> fairly specified amino acids at minimum

And where the bloody hell is there a system that *requires* 10,000
bloody 'fairly specified' (whatever that means) amino acids? I think
all such systems are purely imaginary.

> would probably require, on
> average, perhaps as many as 1,000 fairly specified amino acid position
> changes. If just 10% of these were neutral changes, on average, a
> large colony numbering in the trillions would still take trillions
> upon trillions upon trillions of years to evolve even one new type of
> function at such a level of minimum informational complexity
> (complexity = minimum sequence size plus minimum sequence
> specificity).

The above is a statement that you are starting from a completely random
sequence at least 1000 amino acid *changes* away from the end point.
That is equivalent to a claim that the world was destroyed by aliens
last week and we are all now mental images in their computers.
Completely divorced from reality.

So where are these magical systems? And don't just *say* the bacterial
flagella and hand-wave meaningless numbers based on your WAGs wrt the
number of proteins and total number of amino acids. Convince me that
any possible pathway to the current flagella requires *at least one*
proposed step in the evolution of the bacterial flagella that *requires*
starting with a protein that is more than 1000 *required* amino acids*
away from the current state. Hell, more than 100 amino acids away.


>
> > >>The sequence space that is *relevant* to evolutionary mechanism is the
> > >>sequence space encoded by the proteins in that organism.
> > >
> > >
> > > Not so. The sequence space that is relevant to the evolution of a
> > > particular gene pool is the sequence space that surrounds all possible
> > > beneficial functions that could be used by that type of organism in
> > > its current environment at various levels of functional complexity.
> >
> > Nope. The sequence space that is relevant to evolutionary mechanisms is
> > the sequence space immediately near (within a few mutational steps of)
> > the existing genome in that organism. Period. End of story.
>
> That would be great if it were true. The fact is that on average, as
> you move up the ladder of functional complexity, there are
> exponentially fewer and fewer starting points that are anywhere near
> any other type of beneficial functional sequence at that level of
> functional complexity.

These are meaningless terms. How do you quantitate "level of functional
complexity"? All I see is that it is purely the total number of amino
acids, or, if you are feeling generous, about 50% of that value. No
analysis or anything. Sheer handwaving numerology.

> There just aren't any sequences within the
> gene pool that are only one or two steps away from a new type of
> function at such levels of complexity. That is why you don't have any
> examples of real time evolution at such levels of complexity. It just
> doesn't happen. Period. End of Story.

How did you determine this? By your GIGO numbers based on the
assumption that evolution works by starting with some random
(nonfunctional?) sequence thousands of nucleotides away from the end point?

> > > Remember, the sequence space changes exponentially depending upon the
> > > level of complexity in question.
> >
> > Since you are unable to quantify "level of complexity", the point is moot.

What, no answer to this?


> >
> > >>That is, the
> > >>only relevant question is if, in *this* non-random sequence space, there
> > >>is a sequence x number of changes away from a selectable functionality
> > >>required by an environmental change.
> > >
> > >
> > > And the lower the level of functional complexity the more of such
> > > functions there will be within a couple steps of what happens to
> > > already be there in the gene pool. However, regardless of the
> > > starting point, the average distance to those functions at higher and
> > > higher levels of complexity increases exponentially.
> >
> > The *only* way this would make sense is if your 'new' function were a
> > teleologically determined goal toward which all changes were made. That
> > is not how evolution works.
>
> Not at all. I'm not talking about any one particular type of
> function, but about all types of functions within a given level of
> complexity.

And what the hell is that supposed to mean?

> No new type of function within that level of higher
> complexity (requiring a few thousand fairly specified amino acids at
> minimum), will be able to evolve given what a genome has to start
> with.

And how did you determine this?

> This is because, on average, what a given genome has to start
> with will be hundreds and even thousands of neutral fairly specified
> amino acid positional changes away from all other types of functions
> within that level of complexity.

IOW, your model is that evolution starts with some random sequence some
thousands of nucleotides away from any useful function and then procedes
there by a random walk. Except that you disagree and claim you are not
proposing that. That makes your position as clear as the Mississippi.

Howard Hershey

unread,
Jan 1, 2004, 11:49:49 AM1/1/04
to

Howard Hershey wrote:
>
> Sean Pitman wrote:
> >
> > howard hershey <hers...@indiana.edu> wrote in message news:<bssc8f$5sb$1...@hood.uits.indiana.edu>...
> > > Sean Pitman wrote:
> > > > howard hershey <hers...@indiana.edu> wrote in message news:<bscr3q$12r$1...@hood.uits.indiana.edu>...
> > > >
> > > >>Sean, to make a long story short, the ratio of beneficial versus
> > > >>non-beneficial you give is utterly irrelevant to anything. The
> > > >>denominator of your ratio is always *total sequence space* based on
> > > >>taking 1/20 to the power of the total number of amino acids (or minimum
> > > >>number, which, in those cases you want to be unevolvable, is always the
> > > >>same as total number of amino acids).
> > > >

[snip]


> > <snip>
> > > > I'm interested in the total number of amino acids required for a
> > > > particular type of function to be realized at its most minimum
> > > > beneficial level of function. This level is very different depending
> > > > on the type of function in question.

I am making the claim that you only use your method when you want to
demonstrate a large number.

Then let's use *your* method to analyse the probability that the ebg
genes can evolve into a beta-galactosidase.

1) You seem to agree that the native ebga does not have any selectable
lactase activity. Thus generating selectable lactase activity from ebg
is generating a 'new' function. Is that right?

2) You do agree that the native ebg involves a two peptide system, with
ebga being 1030 amino acids long and ebgc being 149 amino acids long,
both being requried for function, plus a regulatory protein (ebgr) which
is also around 1000 amino acids long.

3) You seem to think that knowing the total length of the proteins
involved (in this case, about 1200 for the two that act together at the
same time) and how many proteins are involved in the system (2) allows
you to determine the number of amino acids that are 'fairly specified'
and the 'level of complexity'. Please perform this mathematics on the
ebg system for me. If you cannot calculate "the total number of amino


acids required for a particular type of function to be realized at its

most minimum beneficial level of function" for a simple system like ebg,
what makes you think you can do so for a larger or more complex one?

4) After calculating "the total number of amino acids required for a


particular type of function to be realized at its most minimum

beneficial level of function" (you claim to be able to do so -- at least
I have seen you give estimates of around 480 or so for ebg, but it would
certainly be nice to see what went into the calculation) I want you to
calculate the odds of ebg evolving into a selectable beta-galactosidase
enzyme *based solely on these numbers* and NOT based on any other
knowledge. This estimate of the odds of generating functional lactase
activity from ebg would be called a 'prediction' of *your* 'hypothesis'.

5) Now let's take a different protein or protein system, also 1200 total
amino acids in length. We will make it a bit less complex, by making it
a single unregulated protein. But this protein is in the histidine
pathway. That is, it is a random protein wrt lactase function, chosen
merely because of total amino acids present. Let's say that for *its
function* the very same "total number of amino acids required for a


particular type of function to be realized at its most minimum

beneficial level of function" exists. I want you to calculate the odds
of this protein evolving into a selectable beta-galactosidase enzyme
*based solely on the numbers you thik are important* and NOT based on
any other knowledge about this protein.

6) Same thing, except now we have a completely random sequence of 1200
total amino acids.

I will accept failure to evolve a selectable beta-galactosidase activity
in five years as evidence that your math is correct for *that type of
protein* (even though I really should wait a gazillion years, just to be sure).

Oh. Show your math and reasoning. Now I don't want to bias you, but I
strongly suspect that by *your* method of calculating odds there should
be no difference in the odds of any of these proteins evolving a
selectable beta-galactosidase activity that *none* had to begin with
(well, actually your calculation of odds, according to the way it has
been presented here, should favor one of the last two evolving lactase
activity since they are less complex -- being a single protein rather
than two that have to work together).

That is because your bogus math does not take the specific ancestral
sequence and its pre-existing functionality into consideration.

The interesting thing, of course, is that this experiment (selection for
the evolution of galactosidase activity) actually has been run and your
model of calculating the odds has been tested. That is, the prediction
based on your methods of determining the odds of evolving a new function
has been subject to test. Did it pass the test for *all* of the
examples, or only for the examples where you start with a random protein
or a random sequence?

[snip]


> >
> > > In every system where you cannot dispute that a
> > > new function evolved (by one or a few mutational steps), you arbitrarily
> > > declare that, because this change did not require "thousands of amino
> > > acids", it does not count as whatever you think evolution involves and
> > > one does not make the same calculation with a denominator that uses the
> > > total amino acid number.
> >
> > I haven't made any sort of arbitrary declaration. As it turns out,
> > the only examples that you evolutionists have come up with as
> > "real-time" examples of evolution in action have not required anything
> > more than a few hundred loosely specified amino acids working together
> > at the same time.

How do you distinguish between "loosely specified amino acids working
together at the same time" and "fairly specified amino acids working
together at the same time"? By waving your hands and declaring them that?



> No. All examples of evolution in action involve the modification of
> pre-existing systems. That is what descent with (and by) modification
> means. It doesn't matter how many amino acids the pre-existing system has.
>
> > None of your examples of functions using thousands
> > of amino acids at the same time actually require that all of these
> > amino acids be there for minimum beneficial function of that type to
> > be realized.
>
> Well, it is damn hard to find the systems you want amidst all the other
> systems that evolve just fine. But systems that are thousands of amino
> acids long (that is the only operational definition I see you using) can
> be modified by a few mutational changes just as easily as systems less
> than 100 amino acids long.
>
> > It is my hypothesized position that the reason why your
> > examples were only one or two mutational steps away from success is
> > because the relative density of beneficial sequences at such low
> > levels of functional complexity are rather dense.

Well, now you can give us numbers and arguments to justify your
hypothesis rather than just hand wave them away.

[snip]

>
> > > > Starting with
> > > > pre-exiting proteins having pre-existing individual and collective
> > > > beneficial functions, you will not see the evolution of any new type
> > > > of protein function that requires more than a couple thousand fairly
> > > > specified amino acids working together at the same time.
> > >
> > > Why would anyone expect to see this? I cannot think of any realistic
> > > evolutionary model of any biological system that requires a couple of
> > > thousand amino acid changes?
> >
> > I'm hitting my head against a brick wall here! Come on man! Try and
> > understand what I'm saying. The system that requires a couple
> > thousand amino acids at minimum most likely does not require a couple
> > thousand amino acid changes, starting with something that is already
> > there, to be realized. However, on average, such a system would
> > probably require several hundred fairly specified amino acids position
> > changes to what is already there.

And what does this have to do with your calculation of the odds? It
seems that you think your calculation says something and then, at the
last minute, you toss it out and make a WAG about how many amino acids
need to change.

> > Then, a system that requires 10,000
> > fairly specified amino acids at minimum
>
> And where the bloody hell is there a system that *requires* 10,000
> bloody 'fairly specified' (whatever that means) amino acids? I think
> all such systems are purely imaginary.

Space for a list of all cellular systems that require 10,000 bloody
'fairly specified' amino acids plus the evidence that all 10,000 sites
are 'fairly specified'.


>
> > would probably require, on
> > average, perhaps as many as 1,000 fairly specified amino acid position
> > changes.

Is this a hand-waving WAG or what? I assume that you mean 1,000 changes
*between* positions with selectable activity of any kind, not 500
changes between selectable function A and selectable function B and
another 500 between selectable function B and selectable function C.
Just want to be clear.

[snip]


>
> > > > You can use
> > > > duplication, point mutation, translocation, frame shifts, etc., and
> > > > they will all fail to get you a new type of function that goes very
> > > > far beyond the lowest levels of functional complexity toward any new
> > > > type of function.
> > >
> > > All evolution is at the lowest levels of functional complexity.
> >
> > I couldn't have said it better myself . . .
> >
> > > All
> > > evolution involves duplication, point mutation, translocation, frame
> > > shifts, etc.
> >
> > Exactly . . .
> >
> > > > Simple up-regulation of what you already have will
> > > > only get you so far. Evolving new sequences with new functions just
> > > > doesn't happen beyond very low levels of functional complexity.
> > >
> > > One simply does not evolve "new sequences" by starting with a protein
> > > which is thousands of amino acids away from the end point before there
> > > is any selectable function. Your hypothetical system simply does not
> > > exist in nature.
> >
> > Your wrong. Such systems do exist in nature and in every living
> > thing. The average distance to simple functions requiring just 100 or
> > so loosely specified amino acids sequences may only be 3 or 4 neutral
> > amino acid changes wide.

How does one distinguish between "loosely specified sequences" and
"fairly specified sequences" without waving one's hands furiously?

> > However, those types of functions that
> > require a minimum sequence of 1,000aa are separated by much wider
> > neutral gaps from everything that a given cell has by an average of
> > say, 30 or 40 neutral positional changes (i.e., representing an
> > average neutral gap of sequence space of over 1e50 sequences).

Where is your evidence for this? It seems to me that you are
extrapolating from a mathematical calculation that would make the
evolution of ebg into a selectable lactase essentially improbable.

> > Then,
> > when you get up to those functions that require several thousand
> > fairly specified amino acids at minimum, the average gap may grow to
> > 500 or so neutral changes on average (sequence space of 1e650). Are
> > you starting to see the problem here?
>
> So where are these magical systems? And don't just *say* the bacterial
> flagella and hand-wave meaningless numbers based on your WAGs wrt the
> number of proteins and total number of amino acids. Convince me that
> any possible pathway to the current flagella requires *at least one*
> proposed step in the evolution of the bacterial flagella that *requires*
> starting with a protein that is more than 1000 *required* amino acids*
> away from the current state. Hell, more than 100 amino acids away.

Remember that no one (except your argument) is proposing that bacterial
flagella arose in one fell swoop with no intermediate states of utility.
Quite the opposite.

Actually I agree to some extent. *If* there aren't systems that are
within a few mutational steps of a new selectable function, it won't
happen. That is why states of intermediate utility (but not necessarily
the teleologic or curren utility) are considered so important in
producing evolutionary hypotheses (possible pathways). If wings had to
evolve *from nothing* with no utility at all until the organism could
soar like a vulture in the span of a hundred years, wings could not
evolve. But wings did not evolve *from nothing* (forelimb modification
was fairly important) and did not have to generate the soaring ability
of a vulture in the span of a hundred years.



> > > > Remember, the sequence space changes exponentially depending upon the
> > > > level of complexity in question.
> > >
> > > Since you are unable to quantify "level of complexity", the point is moot.
>
> What, no answer to this?

I really would like you to quantify "level of complexity".

Sean Pitman

unread,
Jan 1, 2004, 2:28:15 PM1/1/04
to
Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF3BC16...@indiana.edu>...


> I have asked this repeatedly, so now I will shout. HOW THE BLOODY HELL
> DO YOU DEFINE WHAT IS OR IS NOT 'FAIRLY SPECIFIED ORDER'
> AND WHAT VALUE DO YOU GIVE IT?

Specified order is defined by the degree of constraints required on
amino acid positions. In other words, how many positions are
completely invariant? How many more positions are partially variant
and to what degree? I have repeatedly talked about this so I am quite
surprised that you don't seem to remember reading such discussions.
Anyway, I will repeat using cytochrome c again as an example:

Based on cytochrome c an analysis of over 40 species a minimum of
around 80 amino acids are required for this type of function to be
realized with around 100 or so amino acids used on average. Of these,
30 amino acid positions that are highly constrained - rarely involving
more than one type of amino acid. An additional 36 positions only
vary between 2 or 3 different amino acids. Another 15 positions vary
between no more than 4 different amino acids. Only 4 positions out of
105 positions vary by more than 7 different amino acids. The two most
variable positions (60 and 89) vary by only 9 out of 20 possible amino
acids.

It is quite obvious that the cytochrome c type of function is rather
constrained by both its minimum amino acid requirement as well as the
fairly high degree of specificity required by the sequencing of these
amino acids. Well over 60% of this protein is restrained to within 3
amino acid options out of the 20 that are possible. That is a
significant constraint wouldn't you say? In fact the most generous
estimates of the total number of possible cytochrome c sequences in
sequence space that I have come across, based on these constraints,
suggest no more than 10e60 cytochrome c sequences exist. If you know
something different than I have suggested here, please do show me
otherwise.

Now you ask, "What value do I give such numbers?" What does this
estimate mean? Given that the total number of possible sequences in
sequence space requiring a minimum of 80 amino acids is well over
10e100, the ratio of cytochrome c sequences to non-cytochrome c
sequences is less than 1 in 10e40 sequences. This translates into an
average gap of 30 amino acid changes between various islands of
cytochrome c sequence clusters within sequence space. In other words,
if a particular organism that did not have a cytochrome c function but
would benefit from this type of function, its current proteins would
differ from the nearest potential cytochrome c sequence by an average
of over 30 amino acid positions. Don't you find that rather
significant? If I were you this fact would strike me as quite alarming
considering your belief in the validity of evolution and considering
the fact that far greater levels of specified complexity exist within
all living things.

file:///C:/Documents%20and%20Settings/Sean/My%20Documents/Evolution/References/280,46,Slide
46

> > This is not so for the lactase function.

> > Can a functional lactase enzyme to work in a beneficial manner in any


> > life form with only 80 amino acids?
>
> You keep confounding total number of amino acids with number of amino
> acids needed to perform a function.

I suggest to you that the minimum number of amino acids needed to
perform a particular type of function is indeed the total number of
amino acids needed to perform that type of function at its minimum
level of beneficial selectability. I fail to understand what you are
trying to get at here.

> > I don't think so. It seem like
> > the lactase function requires more than 400 fairly specified amino
> > acids (though less specified than in the cytochrome c function) before
> > this type of function can be realized.
>
> So, shouting again, HOW THE BLOODY HELL DID YOU CALCULATE
> A NUMBER OF 400 AMINO ACIDS? I STRONGLY SUSPECT THAT
> NUMBER IS NOTHING BUT A WAG (WILD-ASSED GUESS).

If you don't believe this number, it should be a fairly simple thing
for you to disprove this assertion - which is based on my own BLAST
database search of known lactases. Ian Musgrave also suggested, after
his own database search, that the minimum number might be as high as
480 amino acids for the most basic lactase enzyme. Now, if you can
find a lactase enzyme shorter than 400aa actually working in a living
creature I would be very glad to know of it. Until then, your
hollering isn't going to disprove my position or lessen its predictive
power. It would be much more convincing, on your part, if you were to
actually try to find evidence against the stuff that I say instead of
simply yelling out your incredulous one-liners.

> > Now, can you get the flagellar
> > motility type of function with only 400-coded amino acid positions
> > working together at the same time? I don't think so. This type of
> > function seems to require at least 4,000 to 6,000 fairly specified
> > amino acids working together at the same time in order for the bare
> > minimum level of beneficial function of this type to be realized.
>
> AND AGAIN, HOW THE BLOODY HELL DID YOU CALCULATE THAT
> NUMBER? I STRONGLY SUSPECT THAT NUMBER IS NOTHING BUT
> A WAG. Convince me otherwise.

Now this is promising. It least by making this statement you may be
starting to realize that if my assertions are true that evolution is
in big trouble.

In any case, I notice that you didn't respond when I did roughly
detail how I calculated this number below. How can you yell out this
question after having read what I wrote? I suggest to you that a more
effective response would be to try and counter what I already wrote.
Again, the calculation is based on the following:

This type of function (the flagellar system of motility) requires, at


minimum, at least 20 different kinds of proteins working together at

the same time. If each of these proteins is composed of around 300
fairly specified amino acids on average this works out to around


6,000aa collective, novel, fairly specified amino acid positions
working together at the same time for this time of function to be
realized at a minimum level of beneficial selectability.

Now there you go. All you have to do is prove that my assertions here
are wrong. Upon what basis am I way off base here? What is the
minimum number of coded amino acid positions that you would suggest in
order to realize this type of function at a minimum level of
beneficial selectability?

And again, that is all I have time for today. Hope you're having or
at least had a very happy New Year! ; )

Sean
www.naturalselection.0catch.com

Sean Pitman

unread,
Jan 1, 2004, 2:31:47 PM1/1/04
to
Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF3BC16...@indiana.edu>...


> I have asked this repeatedly, so now I will shout. HOW THE BLOODY HELL
> DO YOU DEFINE WHAT IS OR IS NOT 'FAIRLY SPECIFIED ORDER'
> AND WHAT VALUE DO YOU GIVE IT?

Specified order is defined by the degree of constraints required on


amino acid positions. In other words, how many positions are
completely invariant? How many more positions are partially variant
and to what degree? I have repeatedly talked about this so I am quite
surprised that you don't seem to remember reading such discussions.
Anyway, I will repeat using cytochrome c again as an example:

Based on cytochrome c analysis of over 40 species, a minimum of around
80 amino acids are required for this type of function to be realized

file:///C:/Documents%20and%20Settings/Sean/My%20Documents/Evolution/References/280,46,Slide
46

> > This is not so for the lactase function.
> > Can a functional lactase enzyme to work in a beneficial manner in any


> > life form with only 80 amino acids?
>
> You keep confounding total number of amino acids with number of amino
> acids needed to perform a function.

I suggest to you that the minimum number of amino acids needed to


perform a particular type of function is indeed the total number of
amino acids needed to perform that type of function at its minimum
level of beneficial selectability. I fail to understand what you are
trying to get at here.

> > I don't think so. It seem like


> > the lactase function requires more than 400 fairly specified amino
> > acids (though less specified than in the cytochrome c function) before
> > this type of function can be realized.
>
> So, shouting again, HOW THE BLOODY HELL DID YOU CALCULATE
> A NUMBER OF 400 AMINO ACIDS? I STRONGLY SUSPECT THAT
> NUMBER IS NOTHING BUT A WAG (WILD-ASSED GUESS).

If you don't believe this number, it should be a fairly simple thing


for you to disprove this assertion - which is based on my own BLAST
database search of known lactases. Ian Musgrave also suggested, after
his own database search, that the minimum number might be as high as
480 amino acids for the most basic lactase enzyme. Now, if you can
find a lactase enzyme shorter than 400aa actually working in a living
creature I would be very glad to know of it. Until then, your
hollering isn't going to disprove my position or lessen its predictive
power. It would be much more convincing, on your part, if you were to
actually try to find evidence against the stuff that I say instead of
simply yelling out your incredulous one-liners.

> > Now, can you get the flagellar


> > motility type of function with only 400-coded amino acid positions
> > working together at the same time? I don't think so. This type of
> > function seems to require at least 4,000 to 6,000 fairly specified
> > amino acids working together at the same time in order for the bare
> > minimum level of beneficial function of this type to be realized.
>
> AND AGAIN, HOW THE BLOODY HELL DID YOU CALCULATE THAT
> NUMBER? I STRONGLY SUSPECT THAT NUMBER IS NOTHING BUT
> A WAG. Convince me otherwise.

Now this is promising. It least by making this statement you may be


starting to realize that if my assertions are true that evolution is
in big trouble.

In any case, I notice that you didn't respond when I did roughly
detail how I calculated this number below. How can you yell out this
question after having read what I wrote? I suggest to you that a more
effective response would be to try and counter what I already wrote.
Again, the calculation is based on the following:

This type of function (the flagellar system of motility) requires, at


minimum, at least 20 different kinds of proteins working together at

the same time. If each of these proteins is composed of around 300
fairly specified amino acids on average this works out to around


6,000aa collective, novel, fairly specified amino acid positions
working together at the same time for this time of function to be
realized at a minimum level of beneficial selectability.

Now there you go. All you have to do is prove that my assertions here

Howard Hershey

unread,
Jan 2, 2004, 9:22:43 AM1/2/04
to

Sean Pitman wrote:
>
> Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF3BC16...@indiana.edu>...
>
> > I have asked this repeatedly, so now I will shout. HOW THE BLOODY HELL
> > DO YOU DEFINE WHAT IS OR IS NOT 'FAIRLY SPECIFIED ORDER'
> > AND WHAT VALUE DO YOU GIVE IT?
>
> Specified order is defined by the degree of constraints required on
> amino acid positions. In other words, how many positions are
> completely invariant? How many more positions are partially variant
> and to what degree? I have repeatedly talked about this so I am quite
> surprised that you don't seem to remember reading such discussions.
> Anyway, I will repeat using cytochrome c again as an example:
>
> Based on cytochrome c analysis of over 40 species, a minimum of around
> 80 amino acids are required for this type of function to be realized
> with around 100 or so amino acids used on average.

What do you mean by "required" in this context? That they must be
invariant? That they must be hydrophobic? That they must *not* be
proline? Again, you repeatedly fail to define 'fairly specified' and
how you determine that an amino acid is 'fairly specified'.

> Of these, 30 amino
> acid positions that are highly constrained - rarely involving more
> than one type of amino acid.

Rarely is not never. How many are invariant? It seems to me that you
count an amino acid as 'fairly specified' if it exhibits *any*
constraint and treat it as if it were invariant. Actually, I haven't
seen you calculate anything. All I have seen you do is say that there
are positions that are very highly constrained, positions that are
somewhat less constrained, and positions that are even less constrained.
Then you wave your hands and come up with a number.

Again, it is well known that all *modern* cytochrome c's have a high
percentage of evolutionarily constrained sequences. That is because
cytochrome c is small and most of its amino acids are in contact with
the substrate. The high degree of evolutionary constraint is what makes
this sequence particularly useful for analyzing deep phylogeny. The
same is true for histones, for much the same reason. But you typically,
and misleadingly, apply these percentages of evolutionary constraint to
large molecules that show a much, much lower amount of evolutionary
constraint. According to your logic, the small fibrinogen peptide
sequence should be highly constrained as well, as it is part of a very
large fibrin protein.

> An additional 36 positions only vary
> between 2 or 3 different amino acids. Another 15 positions vary
> between no more than 4 different amino acids. Only 4 positions out of
> 105 positions vary by more than 7 different amino acids. The two most
> variable positions (60 and 89) vary by only 9 out of 20 possible amino
> acids.

Yes. I have no problem with the idea that cytochrome c has a higher
degree of evolutionary constraint than other proteins. There is
definitely evidence for that. But remember that you are only looking at
40 sequences (and are probably overweighted in sequences in metazoan
animals that diverged recently). Positions that vary by more than 7
amino acids are *not* highly constrained, when you are only looking at
40 sequences, given that, even assuming a random sampling, you wouldn't
expect to see all 20 amino acids at any single position when you are
only examining 40 sequences.

More importantly, however, there is no way you can use these numbers
from any *modern* protein or system to say *anything* about how
difficult or easy it was to evolve that system. The numbers are
completely irrelevant. The *only* model of evolution that you can use
these numbers to test is the model your math tells us you are testing.
Specifically, these numbers can tell us only the odds of the modern
system evolving from the starting point of a random sequence by a
process of a complete random walk with no possible intermediate states
of utility. Since no one proposes that any *modern* protein or system,
with rare exception like the nylonase case, evolves this way, your
numbers are irrelevant *even if* they are correct.


>
> It is quite obvious that the cytochrome c type of function is rather
> constrained by both its minimum amino acid requirement

You *still* haven't told me how you calculated "minimum amino acid
requirement" besides say that some positions in cytochrome c are
strongly conserved, others less conserved, and others still less. Then
we get a wave of your hand and a number.

> as well as the
> fairly high degree of specificity required by the sequencing of these
> amino acids. Well over 60% of this protein is restrained to within 3
> amino acid options out of the 20 that are possible.

The average time for a 100% probability of replacement of a single
selectively neutral amino acid is 100 million years, Sean. If two of
the sequences you examine were eutherian mammals (which separated some
60 million years ago), that would mean that 40% of their sequence would
be identical *EVEN IF* every amino acid in their cytochrome c were
selectively neutral and free to drift. And it would take around 300
million years for all the possible amino acids due to neutral selection
to be reached from an ancestral sequence *by chance alone* (selection,
of course, works much, much faster -- indeed, in certain environments
one generation is enough time) because you would need mutation at more
than one site in a codon. Have you taken this into account at all? I
sure can't tell, because all you do is make a series of statements and
then wave your hand to come up with a number. Not that it really
matters, since even if the number were correct, it would be irrelevant.

> That is a
> significant constraint wouldn't you say? In fact the most generous
> estimates of the total number of possible cytochrome c sequences in
> sequence space that I have come across, based on these constraints,
> suggest no more than 10e60 cytochrome c sequences exist. If you know
> something different than I have suggested here, please do show me
> otherwise.

That certainly is a large number of potential different cytochrome c
sequences (that is, sequences with quite significant cytochrome c
activity). And the question is, of what possible relevance is that
knowledge to what you are saying wrt the impossibility of evolution?
Unless, of course, your logic is that cytochrome c sequences arise by a
completely random process from a random sequence?



> Now you ask, "What value do I give such numbers?" What does this
> estimate mean? Given that the total number of possible sequences in
> sequence space requiring a minimum of 80 amino acids is well over
> 10e100, the ratio of cytochrome c sequences to non-cytochrome c
> sequences is less than 1 in 10e40 sequences.

SFW? You keep coming up with this bogus ratio that presumes that
cytochrome c, or whatever protein or system one is talking about, is
derived by starting from a random sequence and generating the final
result by a completely random walk with no possible states of
intermediate utility. And then you turn around and deny that that is
what you are saying.

> This translates into an
> average gap of 30 amino acid changes between various islands of
> cytochrome c sequence clusters within sequence space.

And how is this "average gap" calculated from those two numbers? Be
precise. And is that or is that not the "average gap" between a
*random* protein or a *random* sequence and some sequence that has
cytochrome c activity, just as all your other calculations are based on
putative gaps between *random* proteins and *random* sequences and some
teleologically determined end point with the proviso that only a random
walk with no functional intermediates is possbile between the two states.

Of course, as my little exercise with ebg shows, evolution does not work
by starting with a *random* protein or a *random* sequence. And all
that matters is the number of mutational steps required to generate a
selectable activity even if that activity is not the end activity in
question. IOW, your numbers are totally irrelevant. Not *even* wrong.
But as useful as knowing the distance between earth and the furthest
galaxy is to determining how one gets from L.A. to San Francisco.
Utterly irrelevant.

> In other words,
> if a particular organism that did not have a cytochrome c function but
> would benefit from this type of function, its current proteins would
> differ from the nearest potential cytochrome c sequence by an average
> of over 30 amino acid positions.

Evolution would not *start* with the average protein. It would start
with the extreme end of the bell-shaped distribution of proteins that is
only 1 or 2 amino acids away from a selectable function. In the case of
cytochrome c, it would start with a protein that already has an affinity
for heme (there are other heme-binding proteins) that, at least some of
the time restricted the direction of electron transfer to the ends, and
subsequently select for variants that more frequently and with greater
stability covered surfaces to direct electron transfer to the ends, with
selection favoring nearly every step of this process. That process
would quickly (but only on a geological timescale, not a human one)
reach an 'optimal' state. Subsequent change would produce (and be
largely limited to) the selectively neutral differences we see in all
the modern cytochrome c's. Unlike the changes due to selection, these
selectively neutral differences occur slowly on a geological timescale.

> Don't you find that rather
> significant? If I were you this fact would strike me as quite alarming
> considering your belief in the validity of evolution and considering
> the fact that far greater levels of specified complexity exist within
> all living things.

I consider your "fact" as 1) unsubstantiated even in its own terms, 2)
irrelevant even if correct. Nothing you have said here either makes
your statements substantiated, nor relevant if substantiated.


>
> file:///C:/Documents%20and%20Settings/Sean/My%20Documents/Evolution/References/280,46,Slide
> 46
>
> > > This is not so for the lactase function.
> > > Can a functional lactase enzyme to work in a beneficial manner in any
> > > life form with only 80 amino acids?

80 total amino acids or 80 amino acids involved in the hydrolysis of a
particular glycoside? The number of amino acids involved directly in
the hydrolysis is quite small. The number of amino acids involved in
binding a particular sugar (galactose rather than, say, glucose) is also
small and different from and independent of the amino acids involved in
the hydrolysis reaction. Yet your argument is that one cannot change
just, say, the sugar binding site of a previously existing protein but
must change *all* the amino acids to invent all these 'functions' from a
random sequence. And, as I have pointed out, no one thinks that lactase
of any kind arose from a *random* protein or a *random* sequence.
Rather, it arose by modification of a pre-existing protein that already
hydrolysed glycoside linkages. Any calculation of the odds of lactase
arising from some average or *random* protein or sequence is simply
irrelevant, whether the number of amino acids involved in the active
sites and binding sites is 80, 480, 1030, or 3.

> > You keep confounding total number of amino acids with number of amino
> > acids needed to perform a function.
>
> I suggest to you that the minimum number of amino acids needed to
> perform a particular type of function is indeed the total number of
> amino acids needed to perform that type of function at its minimum
> level of beneficial selectability. I fail to understand what you are
> trying to get at here.

I am trying to see how you actually determine "minimum number of amino
acids needed to perform a particular type of function". You keep
presenting these hand-waving numbers that appear to be nothing more than
WAGs. Not, of course, that knowing this would make your argument, based
as it is on the idea that evolution works by starting with a *random*
sequence and procedes by a *random* walk. But it is an interesting
point in its own right.


>
> > > I don't think so. It seem like
> > > the lactase function requires more than 400 fairly specified amino
> > > acids (though less specified than in the cytochrome c function) before
> > > this type of function can be realized.
> >
> > So, shouting again, HOW THE BLOODY HELL DID YOU CALCULATE
> > A NUMBER OF 400 AMINO ACIDS? I STRONGLY SUSPECT THAT
> > NUMBER IS NOTHING BUT A WAG (WILD-ASSED GUESS).
>
> If you don't believe this number, it should be a fairly simple thing
> for you to disprove this assertion - which is based on my own BLAST
> database search of known lactases.

That is insufficient information. *HOW* did you use the BLAST database
to determine this number?

> Ian Musgrave also suggested, after
> his own database search, that the minimum number might be as high as
> 480 amino acids for the most basic lactase enzyme. Now, if you can
> find a lactase enzyme shorter than 400aa actually working in a living
> creature I would be very glad to know of it.

HOW was this determined? Stop beating around the bush. If your number
is nothing but a WAG based on very little analysis, tell me. If you
actually did some analysis tell me what you did. I am not saying that
the number is wrong or right (although it certainly is irrelevant). I
just want to know HOW THE BLOODY HELL YOU ARRIVED AT THAT NUMBER IN A
WAY THAT WAS NOT JUST A WAG. What assumptions went into your
'calculations'? How did you arrive at the numbers you did?

> Until then, your
> hollering isn't going to disprove my position or lessen its predictive
> power. It would be much more convincing, on your part, if you were to
> actually try to find evidence against the stuff that I say instead of
> simply yelling out your incredulous one-liners.

How can I even try to find evidence against what you say when all I have
is a bunch of statements about constrained sites and then a hand-wave
and a final number? Until you tell me the apparently secret formula you
used for calculating the number you come up with, all I have is this
irrelevant hand-wave number that you *assert* without evidence is
somehow meaningful. Once you tell me what assumptions and calculations
went into generating these numbers, I can probably tell you where you
went wrong -- most likely by taking a number and making it more
meaningful than it really is or by taking any amino acid site you can
conceivably say shows some sort of 'constraint' and treating it like it
needed to be invariant.

Again, even if the number were correct, it would be irrelevant to any
model of evolution except the one where you start with a completely
random sequence and procede by a functionless random walk. But it would
be nice to be able to say that the number was correct or incorrect. I
cannot do that until you tell me how it was determined.


>
> > > Now, can you get the flagellar
> > > motility type of function with only 400-coded amino acid positions
> > > working together at the same time? I don't think so. This type of
> > > function seems to require at least 4,000 to 6,000 fairly specified
> > > amino acids working together at the same time in order for the bare
> > > minimum level of beneficial function of this type to be realized.
> >
> > AND AGAIN, HOW THE BLOODY HELL DID YOU CALCULATE THAT
> > NUMBER? I STRONGLY SUSPECT THAT NUMBER IS NOTHING BUT
> > A WAG. Convince me otherwise.
>
> Now this is promising. It least by making this statement you may be
> starting to realize that if my assertions are true that evolution is
> in big trouble.
>
> In any case, I notice that you didn't respond when I did roughly
> detail how I calculated this number below. How can you yell out this
> question after having read what I wrote? I suggest to you that a more
> effective response would be to try and counter what I already wrote.
> Again, the calculation is based on the following:
>
> This type of function (the flagellar system of motility) requires, at
> minimum, at least 20 different kinds of proteins working together at
> the same time. If each of these proteins is composed of around 300
> fairly specified amino acids on average this works out to around
> 6,000aa collective, novel, fairly specified amino acid positions
> working together at the same time for this time of function to be
> realized at a minimum level of beneficial selectability.

I didn't respond because it is so damn vague. I know that the flagella
has 20+ different proteins. It is up to you to specify which proteins
are to be included in 'the system'. I do not know that these proteins
are all *working* together *at the same time*. Most of them are simply
attached to other proteins and do nothing at all but provide a structure
that gets moved around by the independent actions of other proteins.
Much of the movement is induced by only a few of the proteins, and they
have little or no contact with most of the other proteins. Yet other
proteins involved in flagella production are involved as scaffolding in
the construction and are not present in the final product. So you
really do need to name names and tell me exactly what you mean by
"working together at the same time". And you certainly have not
presented any justification at all that each of these proteins involve
300 'fairly specified amino acids on average' nor presented any detailed
analysis of how one can determine this.

And, as I have repeatedly pointed out, even *if* you could do that, you
have yet to convince me that anyone thinks that the bacterial flagella
arose from a whole set of 20 *average* or *random* proteins or 20
different *average* or *random* sequences by a random walk with no
possibility of intermediate utility. That is compared to the usual
evolutionary alternative mechanism of the flagella evolving stepwise
using decidedly non-average, non-random proteins which have functional
relevance to each step in the process and in which each step produces
intermediate structures that have independent functional and selectable utility.

After all, the evolution of lactase activity in E. coli when its
original lacZ is deleted occurs, not by a random walk from a random
protein or a random sequence, but by a small selectable step in a
*specific* pre-existing protein that is only one mutation away from
having some selectable lactase activity. All your calculations based on
minimal number of amino acids and complexity are utterly irrelevant in
that case. All that counted was the existence of a specific protein
without lactase activity, but with the potential to be modified to have
that activity, i.e, the number of mutational steps required to get
activity in the case where evolution *did* occur was completely
unrelated to numbers you would calculate for lactase activity. It is
likely also irrelevant in all other cases of evolution as well.



> Now there you go. All you have to do is prove that my assertions here
> are wrong.

Like I say, whether they are wrong or right in their own terms doesn't
really matter. They would still be irrelevant. And I can neither agree
with nor disprove your assertions (your WAG numbers) until you tell me,
very explicitly, how they were arrived at.

> Upon what basis am I way off base here?

The basis where you assume, in your calculations, that evolution works
by starting with a random protein or a random sequence and procedes by a
random walk with no possible intermediate utility. Other than that, I
cannot argue whether your numbers are right or wrong because you never
tell me how you went about generating them, what assumptions you make
and how you dealt with complexities like the rate of neutral drift. You
just make assertions and then poof the number out as an unevidenced WAG.
And then pretend it is my fault that I cannot read your mind wrt how you
came up with that number.

> What is the
> minimum number of coded amino acid positions that you would suggest in
> order to realize this type of function at a minimum level of
> beneficial selectability?

I wouldn't start with a random sequence or random protein and procede by
a random walk with no possible intermediate utility. Evolution doesn't.
So why should I if I want to say something about how evolution really works?

RobinGoodfellow

unread,
Jan 3, 2004, 4:06:24 AM1/3/04
to
seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.03123...@posting.google.com>...

Good gravy! That was so wrong, it feels wrong to even use the word
"wrong" to describe it. All I can recommend is that you run, don't
walk, to your nearest college or university, and sign up as quickly as
you can for a few math and/or statistics courses: I especially
recommend courses in probability theory and stochastic modelling.
With all due respect, Sean, I am beginning to see why the biologists
and biochemists in this group are so frustrated with you: my
background in those fields is fairly weak - enough to find your
arguments unconvincing but not necessarily ridiculous - but if you are
as weak with biochemistry as you are with statistical and
computational problems, then I can see why knowledgeable people in
those areas would cringe at your posts.

I'll try to address some of the mistakes you've made below, though I
doubt that I can do much to dispel your misconceptions. Much of my
reply will not even concern evolution in a real sense, since I wish to
highlight and address the mathematical errors that you are making.

> RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...

> > It is even worse than that. Even random walks starting at random points
> > in N-dimensional space can, in theory, be used to sample the states
> > with a desired property X (such as Sean's "beneficial sequences"), even
> > if the number of such states is exponentially small compared to the
> > total state space size.
>
> This depends upon just how exponentially small the number of
> beneficial states is relative to the state space.

No, it does not. If you take away anything from this discussion, it
has to be this: the relative number of beneficial states has virtually
no bearing on the amount of time a local search algorithm will need to
find such a state. The things that *would* matter are the
distribution of beneficial states through the state space, the types
of steps the local search is allowed to take (and the probabilities
associated with each step), and the starting point. For an extreme
example, consider a space of strings consisting of length 1000, where
each position can be occupied by one of 10 possible characters.
Suppose there are only two beneficial strings: ABC........, and
BBC........ (where the dots correspond to the same characters). The
allowed transitions between states are point mutations, that are
equally probable for each position and each character from the
alphabet. Suppose, furthermore, that we start at the beneficial state
ABC. Then, the probability of a transition from ABC... to BBC... in a
single mutation 1/(10*1000) = 1/10000 (assuming self-loops - i.e.
mutations that do not alter the string, are allowed). Thus, a random
walk that restarts each time after the first step (or alternatively, a
random walk performed by a large population of sequences, each
starting at state ABC...) is expected to explore, on average, 10000
states before finding the next beneficial sequence. Now, below, we
will apply your model to the same problem.

> It also depends
> upon how fast this space is searched through. For example, if the
> ratio of beneficial states to non-beneficial states is as high as say,
> 1 in a 1e12, and if 1e9 states are searched each second, how long with
> it take, on average, to find a new beneficial state?

OK. Let's take my example, instead, and apply your calculations.
There are only 2 beneficial sequences, out of the state space of
1e1000 sequences. Since the ratio of beneficial sequences to
non-beneficial ones is (2/10^1000), if your "statistics" are correct,
then I should be exploring 10^1000/2 states, on average, before
finding the next beneficial state. That is a huge, huge, huge number.
So why does my very simple random walk explore only 10,000 states,
when the ratio of beneficial sequences is so small?

The answer is simple - the ratio of beneficial states does NOT matter!
All that matters is their distribution, and how well a particular
random walk is suited to explore this distribution. (Again, it is a
gross, meaningless over-simplification to model evolution as a random
walk over a frozen N-dimensional sequence space, but my point is that
your calculations are wrong even for that relatively simple model.)

> It will take
> just over 1,000 seconds - a bit less than 20 minutes on average. But,
> what happens if at higher levels of functional complexity the density
> of beneficial functions decreases exponentially with each step up the
> ladder? The rate of search stays the same, but the junk sequences
> increase exponentially and so the time required to find the rarer and
> rarer beneficial states also increases exponentially.

The above is only true if you use the following search algorithm:
1. Generate a completely random N-character sequence
2. If the sequence is beneficial, say "OK";
Otherwise, go to step 1.

For an alphabet of size S, where only k characters are "beneficial"
for
each position, the above search algorithm will indeed need to explore
exponentially many states in N (on average, (S/k)^N), before finding a
beneficial state. But, this analysis applies only to the above search
algorithm - an exteremely naive approach that resembles nothing that
is going on in nature. The above algorithm isn't even a random walk
per se, since random walks make local modifications to the current
state, rather than generate entire states anew. A random walk
starting at a given beneficial sequence, and allowing certain
transitions from one sequence to another, would require a completely
different type of analysis. In the analyses of most such search
algorithms, the "ratio" of beneficial sequences would be irrelevant -
it is their *distribution* that would determine how well such an
algorithm would perform. My example above demonstrates a problem
where the ratio of beneficial states is exteremely tiny, yet the
search finds a new beneficial state relatively quickly. I could also
very easily construct an example where the ratio is nearly one, yet a
random walk starting at a given beneficial sequence would stall with a
very high probability. In other words, Sean, your calculations are
irrelevant for the kind of problem you are trying to analyze. If you
wish to model evolution as a random walk of point mutations on a
frozen N-dimensional sequence space, you will need to apply a totally
different statististical analysis: one that takes into account the
distributions of known "beneficial" sequences in sequence space. And
then I'll tell you why that model too is so wrong as to be totally
irrelevant.

> > Such random walks are at the heart of
> > Monte-Carlo methods, used to solve a wide variety of problems in
> > physics, statistics, computer science, etc. The time requirements
> > for such a random walk would depend on the distribution of valid states
> > (i.e. "beneficial sequences") in the space, the transition probabilities
> > between each state, and, to a lesser extent, the starting point.
>
> Exactly. And the "beneficial sequences" (i.e., the density of
> beneficial sequences) is inversely related, in an exponential manner,
> to the level of minimum informational complexity required for these
> functions to work at a minimum level of beneficial function.

Already addressed above. By the way, you still owe me a working
definition of "informational complexity". Is it related to the total
number of amino acids in all the proteins of a system? The number of
amino acids at the active sites? The amount of genomic information
needed to code for those amino acids? And what exactly is a "minimum
level of beneficial of function"? Would such a level remain
invariate, or would it change depending on an organism's environment
(e.g. for the lactase function, do you think this "minimum level"
would be the same in lactose rich and lactose poor environments?)
Finally, how do we determine that a certain level of complexity is
*required* to achieve this "minimum level of beneficial function" (as
opposed to simply observing that systems with such and such level of
complexity happen to perform this function)?

> > Of
> > course, the size of each state (i.e. the dimension of the space) is also
> > a factor, but the key point (and what makes Monte-Carlo techniques so
> > useful) is that the relationship between time and state size need not be
> > exponential. Depending on the specific details described above, the
> > time requirement may be any function of state size - possibly even a
> > linear function!
>
> No, go and check these formulas again and then show me how they are
> "linear" with increasing minimum state sizes. They are not linear,
> but exponential relationships.

Really, Sean? Well, in the toy example I gave above, the number of
states searched is exactly linear in the state size. Specifically,
for sequence of length N, 10*N states must be searched before the new
beneficial state is found. Which, again, reinforces my point that
the relative number of "beneficial states" doesn't matter: their
distribution does. Your "formulas" are irrelevant.

All in all, do you really think that Monte-Carlo random-walk
procedures would be used in practice if their expected running times
were exponential in the size of the problem? Exponential functions do
not scale very well at all: that is why such procedures are used in
the first place to search through exponentially large state spaces for
an exponentially small number of solutions in *sub-exponential* time.
Perhaps you should contact all the researchers relying on these
methods day-to-day, and tell them to stop wasting their time because
these methods don't work - despite the countless times that they have,
and the theoretical analysis demonstrating that they do? Or, as a
better idea, perhaps you should realize that statistical problems
usually require much more sophisticated answers than raising 20 to the
N-th power, and go learn something?

> However, even if they actually were
> linear as you suggest, this would still pose a significant problem to
> evolution beyond a certain point of informational complexity. Even a
> linear decrease in density with increasing minimum space size would
> result in a linear increase in required time to find new functions at
> that level of complexity.

Thank you for a hearty laugh! For further amusement, I would really
love to see your calculations to back up this claim. I am tempted to
nominate it for a Chez Watt, but I don't know how many computer
scientists read this forum. But please, enlighten me. What would
this level of complexity be? And why do you think such systems exist
in nature?

This is all the time I have for now. I'll try to get back to the rest
of your post within the next day or two, but for now I'd like to leave
you with an admonishment. It is clear that your background in some of
the areas where you are arguing, is, to put it mildly, strenuous.
However, it is also clear that you are an intelligent person, and at
least to an extent, curious about the
world. Don't you think that you owe it to yourself to go out and
learn something about the subjects you wish to argue, so as at least
to appear credible when presenting your arguments to individuals with
a certain amount of in-depth knowledge in these fields? So far you
have failed to win such credibility for yourself, which is a shame,
since your intetions appear to be honest. Perhaps taking the time to
revise, adjust, or possibly retract some of your claims would be
taking a step in the right direction?

Cheers,
RobinGoodfellow.

[snip rest for now]

Sean Pitman

unread,
Jan 3, 2004, 6:18:21 AM1/3/04
to
Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF4957C...@indiana.edu>...


> 1) You seem to agree that the native ebga does not have any selectable
> lactase activity. Thus generating selectable lactase activity from ebg
> is generating a 'new' function. Is that right?

Yes.

> 2) You do agree that the native ebg involves a two peptide system, with
> ebga being 1030 amino acids long and ebgc being 149 amino acids long,
> both being requried for function, plus a regulatory protein (ebgr) which
> is also around 1000 amino acids long.

At this point it might be helpful to consider that the usual wild type
lacZ genes in E. coli produce a tetramer beta-galactosidase. Each
subunit of this tetramer is around 1000aa in size. However, this is
not the minimum size requirement for this type of function to be
realized at a beneficial level of selectability. The minimum size
requirement seems to be well over 400aa. Considering that 12 to 14 of
15 active site residues are identical between LacZ and ebgA, I would
also think that the minimum sequence requirements would also be
similar (i.e., somewhere around 400aa). Also, it is interesting to
note that the ebgC sequence has none of the active site residues and
yet it seems to be essential, as you noted yourself, for the lactase
function. It appears that this small subunit is essential for the
optimal operation of electrophilic catalysis by the active-site Mg^2+.
Also note that a correct mutation in either the ebgR or the ebgA
genes alone will allow selectably advantageous lactase ability. Of
course both mutations occurring at the same time allow for a much
stronger lactase function, but both mutations are not required before
selectable lactase function can be realized. It is known that the
mutation in ebgR arises first that allows the cells to grow very
slowly on lactulose. The second mutation (in the ebgA gene) then
arises and allows the double mutants to grow very rapidly.

http://www.biochemj.org/bj/325/0117/3250117.pdf
http://www.science.siu.edu/microbiology/micr460/460%20Pages/460.SPAM.html

> 3) You seem to think that knowing the total length of the proteins
> involved (in this case, about 1200 for the two that act together at the
> same time) and how many proteins are involved in the system (2) allows
> you to determine the number of amino acids that are 'fairly specified'
> and the 'level of complexity'.

As I have said many many times before, I am interested in knowing the
*minimum* number and specificity of amino acids required to achieve a
particular type of function. I dare say that the 1200aa normally used
in this case are not all needed are and are not all that constrained.
As explained already, a more likely minimum number of required amino
acids is probably somewhere around 400 relatively loosely specified
amino acids.

> Please perform this mathematics on the
> ebg system for me. If you cannot calculate "the total number of amino
> acids required for a particular type of function to be realized at its
> most minimum beneficial level of function" for a simple system like ebg,
> what makes you think you can do so for a larger or more complex one?

But I can. I suggest to you that the type of function produced by the
ebg system has a very similar minimum size requirement and positional
constraint limits as do other lactase genes/systems which seem to have
a minimum requirement of somewhere over 400 relatively loosely
specified amino acids.

Now, you can easily prove me wrong here by finding a functional
lactase enzyme that requires less than 400aa. Do you know of such a
lactase that actually works to some selectable advantage in any living
thing?

> 4) After calculating "the total number of amino acids required for a


> particular type of function to be realized at its most minimum

> beneficial level of function" (you claim to be able to do so -- at least
> I have seen you give estimates of around 480 or so for ebg, but it would
> certainly be nice to see what went into the calculation)

This calculation is based my own database search and the searches of
others that suggest that there are no functional lactase enzymes
smaller than 400aa. So, it seems like the ~400aa level is the best
"minimum" requirement that the evidence available to me so far
supports. If you think otherwise, please do present this evidence.

>I want you to
> calculate the odds of ebg evolving into a selectable beta-galactosidase
> enzyme *based solely on these numbers* and NOT based on any other
> knowledge. This estimate of the odds of generating functional lactase
> activity from ebg would be called a 'prediction' of *your* 'hypothesis'.

The odds are extremely good that the wild-type ebg sequence will
evolve into a selectable beta-galactosidase in short order (one or two
generations) since it is only a single positional change away from
success, but that is not the important question. My idea doesn't look
at sequences so much as it looks at types of functions. What are the
odds that a particular organism or group of organisms will have
anything within their collective genomes that is close enough to
evolve any type of new beneficial function within a given level of
specified complexity? That is the important question.

Given this question, it is very interesting that the E. coli bacterial
species seems to have a "spare tire" lactase gene that is just one
mutation away from success. This would not be such an interesting
finding if lactases where less specified than they are. For example
if the density of lactase sequences in 400aa level of sequence space
were say as high as 1 in a billion, the average gap between lactases
would be less than 7 mutations wide. For a colony of bacteria
numbering say 10 billion individuals, this gap would be crossed in no
more than several months by all types of bacteria. What is
interesting though is the very "limited evolutionary potential" that
many types of bacteria have when it comes to the evolution of this
relatively simple enzymatic function. Without their spare tire gene,
E. coli cannot evolve this lactase function despite very positive
selection pressure, artificially elevated mutation rates, and tens of
thousands of generations of time. Many other types of bacteria have
not been able to evolve this relatively simple lactase function
despite well over a million generations of documented observation.

So, what does this mean? It means that the density of sequences with
the lactase function is actually quite low. This low density is what
limits the evolutionary potential of many organisms that would
otherwise benefit from a lactase enzyme if they were able to evolve
one. The fact that they do not evolve one means that the gap between
what they have and the nearest lactase enzyme is simply more than a
dozen fairly specified mutations away.

> 5) Now let's take a different protein or protein system, also 1200 total
> amino acids in length. We will make it a bit less complex, by making it
> a single unregulated protein.

The fact that a protein operates as a single unit does not make it
less complex than a multiprotein function that requires the same
minimum amino acid number and level of specificity. Also, all protein
functions are regulated in one form or another.

> But this protein is in the histidine
> pathway. That is, it is a random protein wrt lactase function, chosen
> merely because of total amino acids present. Let's say that for *its

> function* the very same "total number of amino acids required for a


> particular type of function to be realized at its most minimum

> beneficial level of function" exists. I want you to calculate the odds
> of this protein evolving into a selectable beta-galactosidase enzyme
> *based solely on the numbers you thik are important* and NOT based on
> any other knowledge about this protein.

Starting with a random sequence of 1,200aa acting in some beneficial
manner, you are asking how long it would take to evolve a
beta-galactosidase? Is that what you are asking? If so, then say the
density of lactases in sequence space of 400aa minimum was low enough
to require 24 specified mutations, on average, to go from one lactase
island to another. If true, then, on average, a sequence of 400aa in
a given gene pool would be around 12 specified mutations away from the
closest lactase sequence creating a gap of 4,000 trillion non-lactase
sequences. Say the colony size is 1 trillion individuals living in a
steady state and the mutation rate is one mutation per 400aa per year
per individual lineage (an pretty high mutation rate). Well, starting
with 1,200aa in a colony of 1 trillion would give us 3 trillion
sequences of 400aa each evolving at the same time (given that this
1,200aa sequence was released from selective constraints perhaps via
gene duplication). This means that each year 3 trillion sequences out
of 4,000 trillion will be searched out. At this rate, on average,
success will be realized in just over 1,300 years on average (defined
as the evolution of a beneficial lactase function in one member of the
population).

> 6) Same thing, except now we have a completely random sequence of 1200
> total amino acids.
>
> I will accept failure to evolve a selectable beta-galactosidase activity
> in five years as evidence that your math is correct for *that type of
> protein* (even though I really should wait a gazillion years, just to be sure).

Ok, what is your counter argument? If the density of lactases in
sequence space of 400aa was very much less than what I based my above
calculations on, then why were Hall's E. coli so limited in their
ability to evolve a type of function with such a high density of
sequences in sequence space?



> That is because your bogus math does not take the specific ancestral
> sequence and its pre-existing functionality into consideration.

Actually it does. No matter what you start with you cannot get around
the fact that on average your starting points will be a certain
distance from new sequences with new types of functions. This
distance gets exponentially larger, no matter what your starting
sequences are, at higher and higher levels of specified complexity.

For example, lets just say, by a sheer extraordinary stroke of luck
that an ancestral sequence in a bacterial colony just happened to be
one or two mutations away from a new type of function as specified and
complex as a flagellar motility system. Well, of course this highly
complex system would evolve in short order now wouldn't it? Ok, but
how many more such systems would it be able to evolve on average? How
long would it take that colony to evolve another type of function at
that same level of complexity or higher given what it now has to
proceed with? Odds are that everything it has will be gazillions of
years away from any other type of function within such a level of
complexity or higher. In fact, the odds are so great against
evolution at such levels that the witnessing of evolution at such a
level should cause one to seriously look into the almost certain
finding of a pre-exiting system that had been lost for a time but
who's code was still there pretty much intact.

For example, cavefish who have lost their eyes still have the code to
make eyes in their genome. It has been shown that a single point
mutation can restore the production of fully formed eyes in the
offspring of these fish. Does this mean that eye evolution has been
demonstrated? Absolutely not. All this shows is that the evolution
of such a highly complex system requires a pre-existing code for this
system that has been shut down by a slight change to the system.
Without this historical existence of sightedness in the ancestors of
these fish, they would never have been able to evolve the ability to
see. The same is true for flagellar motility. I know that you have
suggested that the ability to mutate a flagellar system so that it no
longer works as a motility system, just keeping its TTSS system
intact, and then mutating back the motility function is an example of
high complexity evolution in action. It really is nothing of the
sort. It is on the same level as blind cavefish evolving their eyes
back again. Without the pre-established code already being there and
working at that type and level of functional complexity in the
ancestors of that organism, such levels of complex function would not
evolve in trillions upon trillions of years.

> The interesting thing, of course, is that this experiment (selection for
> the evolution of galactosidase activity) actually has been run and your
> model of calculating the odds has been tested. That is, the prediction
> based on your methods of determining the odds of evolving a new function
> has been subject to test. Did it pass the test for *all* of the

> examples, or only for the examples where you start with a random protein
> or a random sequence?

You must understand that we are talking averages here. What is the
average time required to evolve a new function at a particular level
of complexity? In other words, what is the density of beneficial
functions at various levels of complexity in sequence space? You must
have some sort of idea of the density of beneficial functions in order
to be able to estimate average evolutionary time requirements.
Certainly it was very fortunate that E. coli had at least one and
possibly two sequences within striking distance of a lactase sequence,
but this does not mean that the density of lactase functions can be
adequately estimated based the division of the number of genes in E.
coli by the number of lactase sequences in E. coli. This is a
fallacy. By this method it would seem that the density of lactases
sequences is as high as 1 in 1000 amino acid sequences. This is
obviously incorrect or a beneficial lactase would be no more than 3
mutations away from any 400aa sequence. The evolution of lactase
would be lightening fast at this density in all types of bacteria.
The far more telling evidence is found in the limited ability of lacZ
and ebg negative bacteria to evolve the lactase function over the
course of tens of thousands of generations. That observation gives a
much clearer idea about just how low the density of lactase sequences
really is.



> Remember that no one (except your argument) is proposing that bacterial
> flagella arose in one fell swoop with no intermediate states of utility.
> Quite the opposite.

Obviously that is what you evolutionists fervently believe and is
actually what is required. You must have intermediate steppingstones
that are each selectably beneficial. Your problem is that you don't
have these stones. Your proposed evolutionary pathways for the
flagellar motility system are sorely lacking involving huge gaps that
have never been crossed. Not even one of your proposed steps in the
evolution of a flagellar motility system has been demonstrated to
evolve in real life - not one. Come on now, where are these
steppingstone functions? They just aren't there because those types
of functions at this level of specified complexity are so far away
from all stepping stones that universes of sequences must be sorted
through before any function at this level can be found.

_________
Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF5C491...@indiana.edu>...



> > > I have asked this repeatedly, so now I will shout. HOW THE BLOODY HELL
> > > DO YOU DEFINE WHAT IS OR IS NOT 'FAIRLY SPECIFIED ORDER'
> > > AND WHAT VALUE DO YOU GIVE IT?
> >
> > Specified order is defined by the degree of constraints required on
> > amino acid positions. In other words, how many positions are
> > completely invariant? How many more positions are partially variant
> > and to what degree? I have repeatedly talked about this so I am quite
> > surprised that you don't seem to remember reading such discussions.
> > Anyway, I will repeat using cytochrome c again as an example:
> >
> > Based on cytochrome c analysis of over 40 species, a minimum of around
> > 80 amino acids are required for this type of function to be realized
> > with around 100 or so amino acids used on average.
>
> What do you mean by "required" in this context? That they must be
> invariant? That they must be hydrophobic? That they must *not* be
> proline? Again, you repeatedly fail to define 'fairly specified' and
> how you determine that an amino acid is 'fairly specified'.

It seems to me that you fail to grasp the difference between the
minimum amino acid requirement and the minimum specificity
requirement. They are two different things. Just because a
particular function needs, say 100aa at minimum, does not mean that
all 100aa are highly specified in their order. Likewise, it seems as
though the cytochrome c function requires at least 80aa at minimum,
but this does not mean that each of these 80aa are highly specified or
"invariant". Do you understand the difference between these two types
of limitations now?

You know, it would really help if you read the entire train of thought
before you responded with long paragraphs with single words and
sentences. I often answer many of your questions and statements in
the very next sentence or paragraph, as I did this time.

> > Of these, 30 amino
> > acid positions that are highly constrained - rarely involving more
> > than one type of amino acid.
>
> Rarely is not never. How many are invariant?

You are really hung up on this idea of absolute invariance. I would
dare say that very few if any positions are absolutely invariant -
taken one at a time. The same is true of a 100-character sentence.
This, however, does not mean that this type of function is not highly
constrained. Character positions do not have to be absolutely
invariant in order to be very highly constrained - right?

> It seems to me that you
> count an amino acid as 'fairly specified' if it exhibits *any*
> constraint and treat it as if it were invariant.

Come on now! Are you suggesting that a position limited to less than
3 different amino acids is not "fairly specified"? Give me a break .
. .

> Actually, I haven't
> seen you calculate anything. All I have seen you do is say that there
> are positions that are very highly constrained, positions that are
> somewhat less constrained, and positions that are even less constrained.
> Then you wave your hands and come up with a number.

The math is fairly easy here. Given the constraints listed, you can
calculate the number yourself. You will find that the 10e60 number is
actually being quite generous given these listed constraints
(referenced by Yockey and others). If you think otherwise, do your
own calculation and tell me your results.

> Again, it is well known that all *modern* cytochrome c's have a high
> percentage of evolutionarily constrained sequences.

You mean *functionally* constrained sequences.

> That is because
> cytochrome c is small and most of its amino acids are in contact with
> the substrate. The high degree of evolutionary constraint is what makes
> this sequence particularly useful for analyzing deep phylogeny.

Functional constraints really aren't useful for studying actual
evolutionary relationships. The differences are different because of
different needs for different levels of a particular type of function
because of different environmental and phenotypic demands on different
organisms. Such functional differences in very different organisms
may have always been there by design. Again, the only way to rule out
design as the only logical explanation for such differences is to show
that mindless evolutionary processes can also explain the differences
beyond the lowest levels of informational complexity.

> The
> same is true for histones, for much the same reason. But you typically,
> and misleadingly, apply these percentages of evolutionary constraint to
> large molecules that show a much, much lower amount of evolutionary
> constraint.

I do not apply these percentages to much larger single molecules. I
have repeatedly said that larger single proteins often have far lower
constraints than do smaller protein functions. Cytochrome c and
histone proteins are very highly constrained - much more so than the
larger lactase enzyme and other such larger enzymes and proteins. I
use the smaller proteins as examples because their level of constraint
is well-known and clearly documented. I use them as illustrations to
show that increased constraint results in an equivalent reduction of
density in sequence space. Larger proteins, though less constrained,
may still be quite rare due to their minimum sequence size
requirement, which is a different type of constraint. Though less
constrained than a cytochrome protein, a lactase function requires
over 4 times as many amino acids at minimum. Still, the clincher
comes when you start considering multiprotein systems were each of the
smaller individual proteins have a fairly high level of amino acid
specificity/constraint. Since all of these proteins are required to
work together at the same time, their combined number of fairly highly
constrained amino acids starts to really add up - into the multiple
thousands of fairly specified amino acids working together at the same
time. For the flagellar system I would say that this number is well
over 5,000aa.
'
Again, if you disagree with this number, prove me wrong. Tell me what
you think the minimum genetic real estate is to code for a flagellar
motility system.



> > An additional 36 positions only vary
> > between 2 or 3 different amino acids. Another 15 positions vary
> > between no more than 4 different amino acids. Only 4 positions out of
> > 105 positions vary by more than 7 different amino acids. The two most
> > variable positions (60 and 89) vary by only 9 out of 20 possible amino
> > acids.
>
> Yes. I have no problem with the idea that cytochrome c has a higher
> degree of evolutionary constraint than other proteins. There is
> definitely evidence for that. But remember that you are only looking at
> 40 sequences (and are probably overweighted in sequences in metazoan
> animals that diverged recently). Positions that vary by more than 7
> amino acids are *not* highly constrained, when you are only looking at
> 40 sequences, given that, even assuming a random sampling, you wouldn't
> expect to see all 20 amino acids at any single position when you are
> only examining 40 sequences.

Ok, you give me your best numbers. But remember, a limitation to 19
out of 20 is still a constraint. Then, if over 60% of the positions
of a protein are limited to within 3 amino acids with another 15-20%
limited to within 4 amino acids (~80% total), I would call that highly
constrained, and I dare say you would too if you weren't in a debate
with me.

> More importantly, however, there is no way you can use these numbers
> from any *modern* protein or system to say *anything* about how
> difficult or easy it was to evolve that system. The numbers are
> completely irrelevant.

They are not completely irrelevant. They are the best way that we
have at understanding the density of such types of functional
sequences in sequence space. And, if they were completely irrelevant
you wouldn't work so hard at arguing against them and what they
obviously mean.

> The *only* model of evolution that you can use
> these numbers to test is the model your math tells us you are testing.
> Specifically, these numbers can tell us only the odds of the modern
> system evolving from the starting point of a random sequence by a
> process of a complete random walk with no possible intermediate states
> of utility. Since no one proposes that any *modern* protein or system,
> with rare exception like the nylonase case, evolves this way, your
> numbers are irrelevant *even if* they are correct.

Ok, how else do new types of functions evolve? Explain it to me
again. It seems to me that to get a new type of function you must
evolve new sequences that actually have new types of beneficial
functions. Higher levels of functional complexity had to involve the
previous evolution of steppingstone functions, each of which was
selectably advantageous. If these steppingstones are not there, then
evolution is impossible. If they are there, then evolution is not
only possible, but easy. Do you know another way?

> > It is quite obvious that the cytochrome c type of function is rather
> > constrained by both its minimum amino acid requirement
>
> You *still* haven't told me how you calculated "minimum amino acid
> requirement" besides say that some positions in cytochrome c are
> strongly conserved, others less conserved, and others still less. Then
> we get a wave of your hand and a number.

Again, the minimum amino acid requirement is different from the
minimum level of constraint. The minimum amino acid requirement is
not a calculated number, but is estimated based on the shortest
sequence found in a living thing having this type of function.

> > as well as the
> > fairly high degree of specificity required by the sequencing of these
> > amino acids. Well over 60% of this protein is restrained to within 3
> > amino acid options out of the 20 that are possible.
>
> The average time for a 100% probability of replacement of a single
> selectively neutral amino acid is 100 million years, Sean.

Not at all. Taking a mutation rate of 1e-6 mutations/generation, a
population of just 10 billion would realize such a "neutral" mutation
in just one generation in many of its individuals. Of course, many of
the variations in such proteins as cytochrome c are not neutral, but
are functionally beneficial and maintained as such by natural
selection (see below).

> If two of
> the sequences you examine were eutherian mammals (which separated some
> 60 million years ago), that would mean that 40% of their sequence would
> be identical *EVEN IF* every amino acid in their cytochrome c were
> selectively neutral and free to drift.
>
> And it would take around 300
> million years for all the possible amino acids due to neutral selection
> to be reached from an ancestral sequence *by chance alone* (selection,
> of course, works much, much faster -- indeed, in certain environments
> one generation is enough time) because you would need mutation at more
> than one site in a codon. Have you taken this into account at all? I
> sure can't tell, because all you do is make a series of statements and
> then wave your hand to come up with a number. Not that it really
> matters, since even if the number were correct, it would be irrelevant.

Consider that the average mutation rate for a given gene in all
creatures, is about 1 x 1e-6 mutations per gene per generation. That
means that a given gene will mutate only one time in one million
generations on average. Consider that single celled organisms have a
much shorter generation time than multi-celled organisms on average.
For example, the bacteria E. coli have a minimum generation time of 20
minutes compared to the generation time of humans of around 20 years.
With a gene being mutated every 1 to 10 million generations in E.
coli, one might think this would be a long time. However, each and
every gene in an E. coli lineage will get mutated once every 40 to 80
years. So, in one million years, each gene will have suffered at
least 10,000 mutations. Also consider that the population of single
celled organisms on earth is a lot higher than the populations of
multicelled organisms. For example, there are almost 6 billion people
living on earth today but more than 100 billion E. coli living inside
just one person's intestines.

Now, cytochrome c phylogenies are generally based on analysis of
certain subunits of cytochrome c which range in number of amino acids
up to a maximum of about 600 or so. This would translate into a
minimum of at least 1,800 nucleic acids in DNA coding for this subunit
of cytochrome c protein. Note that the tetrahymena species are about
50% different from all other creatures. It seems then that all the
creatures would have experienced at least a 25% change in their
genetic codes from the time of common ancestor. So how many
generations would it take to achieve this 25% difference?

Taking 25% of 1,800 give us 450 mutations. Lets say that the average
mutation rate is one mutation per 1,800 nucleic acids per one million
generations. For a steady state population of just one individual in
each generation it would take about 450 million generations to get a
25% difference from the common ancestor. With a generation time of 20
minutes (ie: E. coli), that works out to be about 342,000 years.
However, with a steady state population of say a trillion trillion
individuals (the total number of bacteria on earth is somewhere around
five million trillion trillion or 5 with 30 zeros following), one
might expect that the number of generations required to get a 25%
difference would be a bit less. So, for bacteria, the 25% difference
from the common ancestor cytochrome c, might have been achieved
relatively rapidly given the evolutionary time frame (a couple hundred
thousand years or so).

The question is then, if bacteria can achieve such relatively rapid
neutral genetic drift, why are they not more wide ranging in their
cytochrome c sequences? It seems that if these cytochrome c sequence
differences were really neutral differences, that various bacterial
groups, colonies, and species, would cover the entire range of
possible cytochrome c sequences to include that of mammals. Why are
they then so uniformly separated from all other "higher" species
unless the cytochrome sequences are functionally based and therefore
statically different due to the various needs of creatures that
inhabit different environments?

For example, bacteria are thought to share a common ancestor with
creatures as diverse as snails, sponges, and fish dating all the way
back to the Cambrian period some 600 million years ago. All of these
creatures are thought to have been around quite a long time - ever
since the "Cambrian Explosion." In fact, they have all been around
long enough and are diverse enough to exhibit quite a range in
cytochrome c variation. Why then are their cytochrome c sequences so
clustered? Why don't bacteria, snails, fish, and sponges cover the
range of cytochrome c sequence variation if these variation
possibilities are in fact neutral? In other words, why are there not
at least some types of bacteria that share sequence identity with
humans?

I propose that the clustered differences that are seen in genes and
protein sequences, such cytochrome c, are the result of differences in
actual function that actually benefit the various organisms according
to their individual needs. If the differences were in fact neutral
differences, there would be a vast overlap by now with complete
blurring of species' cytochrome c boundaries - even between species as
obviously different as humans and bacteria. Because of this, sequence
differences may not be so much the result of differences due to random
mutation over time as they are due to differences in the functional
needs of different creatures. I think that the same can be said of
most if not all phylogenies that are based on genotypic differences
between creatures.

In 1993, Patterson, Williams, and Humphries, scientists with the
British Museum, reached the following conclusion in their review of
the congruence between molecular and morphologic phylogenies:

"As morphologists with high hopes of molecular systematics, we end
this survey with our hopes dampened. Congruence between molecular
phylogenies is as elusive as it is in morphology and as it is between
molecules and morphology. . . . Partly because of morphology's long
history, congruence between morphological phylogenies is the exception
rather than the rule. With molecular phylogenies, all generated
within the last couple of decades, the situation is little better.
Many cases of incongruence between molecular phylogenies are
documented above; and when a consensus of all trees within 1% of the
shortest in a parsimony analysis is published structure or resolution
tends to evaporate."

http://naturalselection.0catch.com/Files/geneticphylogeny.html


> > That is a
> > significant constraint wouldn't you say? In fact the most generous
> > estimates of the total number of possible cytochrome c sequences in
> > sequence space that I have come across, based on these constraints,
> > suggest no more than 10e60 cytochrome c sequences exist. If you know
> > something different than I have suggested here, please do show me
> > otherwise.
>
> That certainly is a large number of potential different cytochrome c
> sequences (that is, sequences with quite significant cytochrome c
> activity). And the question is, of what possible relevance is that
> knowledge to what you are saying wrt the impossibility of evolution?
> Unless, of course, your logic is that cytochrome c sequences arise by a
> completely random process from a random sequence?

This number has to do with the density of sequences with this type of
function in sequence space. Knowing the density of beneficial
functions in the only way to understand the potential and limits of
evolutionary processes since evolution works via a crossing of
functionally beneficial steppingstones to new types of functions. If
the steppingstones aren't close enough, evolution doesn't happen.

> > Now you ask, "What value do I give such numbers?" What does this
> > estimate mean? Given that the total number of possible sequences in
> > sequence space requiring a minimum of 80 amino acids is well over
> > 10e100, the ratio of cytochrome c sequences to non-cytochrome c
> > sequences is less than 1 in 10e40 sequences.
>
> SFW? You keep coming up with this bogus ratio that presumes that
> cytochrome c, or whatever protein or system one is talking about, is
> derived by starting from a random sequence and generating the final
> result by a completely random walk with no possible states of
> intermediate utility. And then you turn around and deny that that is
> what you are saying.

Not at all. There could be intermediate stepping stone functions
between the original starting point and a cytochrome c function.
However, the odds that these steppingstones are close enough to a
cytochrome c sequence or any other beneficial sequence within that
level of complexity is determined through an understanding of
functional densities of sequences within sequence space. Although
highly specified, the cytochrome c function does not require all that
many amino acids at minimum. It's function is certainly within
striking distance of what I suspect most "original" genomes would have
to begin with. However, going very far above this level of specified
complexity becomes a really big problem really fast.

> > This translates into an
> > average gap of 30 amino acid changes between various islands of
> > cytochrome c sequence clusters within sequence space.
>
> And how is this "average gap" calculated from those two numbers? Be
> precise.

This isn't higher math here. If the sequence space gap that must be
crossed is 10e40 then that works out to be a bit over 20^30, or
slightly over 30 specified amino acid positional changes on average.

> And is that or is that not the "average gap" between a
> *random* protein or a *random* sequence and some sequence that has
> cytochrome c activity, just as all your other calculations are based on
> putative gaps between *random* proteins and *random* sequences and some
> teleologically determined end point with the proviso that only a random
> walk with no functional intermediates is possbile between the two states.
>
> Of course, as my little exercise with ebg shows, evolution does not work
> by starting with a *random* protein or a *random* sequence. And all
> that matters is the number of mutational steps required to generate a
> selectable activity even if that activity is not the end activity in
> question. IOW, your numbers are totally irrelevant. Not *even* wrong.
> But as useful as knowing the distance between earth and the furthest
> galaxy is to determining how one gets from L.A. to San Francisco.
> Utterly irrelevant.

It is not utterly irrelevant. Don't you understand the concept yet?
These estimates help determine the odds that L.A. will actually be as
close to San Francisco as it is. If the average distance from one
beneficial function to another is the size of the universe, that tells
you not to put too much money on the bet that starting with L.A., that
some new beneficial place like San Francisco, will be just a few
hundred miles away. That is what Las Vegas is all about - predicting
the average amount of time it takes someone to win. If the average
amount of time it takes for evolution to "win" a new type of function
at a particular level of complexity is a trillion years, what does it
matter if it happens to win tomorrow? In the end, the average is
still trillions of years. Just like gambling in Las Vegas, if you
keep playing the evolution game too long, even if you have an early
"win", you will eventually loose.

> > In other words,
> > if a particular organism that did not have a cytochrome c function but
> > would benefit from this type of function, its current proteins would
> > differ from the nearest potential cytochrome c sequence by an average
> > of over 30 amino acid positions.
>
> Evolution would not *start* with the average protein. It would start
> with the extreme end of the bell-shaped distribution of proteins that is
> only 1 or 2 amino acids away from a selectable function.

And what are the odds of that? At higher and higher levels of
complexity, the odds that a new type of beneficial function will only
be "just 1 or 2 steps away" becomes more and more remote in an
exponential fashion. The average distance is an extremely important
concept. It means that just because it happened once does not mean
that it will happen again, if ever. The averages get so enormously
small in short order that betting that evolution will succeed over
time becomes an exercise in insanity.

That's all for now . . .

Sean
www.naturalselection.0catch.com

Von Smith

unread,
Jan 3, 2004, 2:37:22 PM1/3/04
to
seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.04010...@posting.google.com>...

I did an analysis of 43 Presidents of the United States and have
concluded that the minimum size needed to achieve a Presidency
function is 5'4" (the height of James Madison). No shorter person can
be President. Now, you can easily prove me wrong here by finding a
functional President who was less than 5'4". Do you know of such a
President who actually served at any time in the United States?

This is exactly the argument you are using here. Do you see the
problem with it? Not only is your logic flawed, but your conclusion
appears to be at odds with what is known about protein chemistry (if
posters such as Howard Hershey, Ian Musgrave, "sweetness", and
"Deaddog" can be trusted as knowledgeable in the field).

Until you substantiate your claims about minimum requirements with
something substantial, there is no need to rub your nose in a
counter-example. It is sufficient to note that your assertion is
neither supported nor consistent with what we know about how proteins
work. I have seen several posters explain to you how they actually
*do* work. I have not seen you draw upon, or for that matter even
demonstrate, any such knowledge.

One question I have is: even if there is some minimum length, how is
this length relevant to discussing the evolvability of a lactase
function from a gene coding for a protein more than twice that length?
Presumably you want to talk about the evolvability of a function
based on its putative density in some "sequence space", and I would
guess that you are trying to argue that the relevant sequence space is
that of the minimum length. But that makes no sense: the ebg protein
isn't searching the 400aa sequence space; it can't. So why, exactly,
does this "minimum length", even if it exist, interest you here?

Von Smith
Fortuna nimis dat multis, satis nulli.

Von Smith

unread,
Jan 3, 2004, 3:01:04 PM1/3/04
to
> Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF4957C...@indiana.edu>...
>

<snip>

>
> > > In other words,
> > > if a particular organism that did not have a cytochrome c function but
> > > would benefit from this type of function, its current proteins would
> > > differ from the nearest potential cytochrome c sequence by an average
> > > of over 30 amino acid positions.
> >
> > Evolution would not *start* with the average protein. It would start
> > with the extreme end of the bell-shaped distribution of proteins that is
> > only 1 or 2 amino acids away from a selectable function.
>
> And what are the odds of that?

Yes, exactly, what *are* the odds? Switching back momentarily to the
example of lactase activity in E. coli, there are at least two other
proteins out of the few thousands it produces that are *known* to be 1
or 2 amino acids away from a selectable lactase function. Is this
frequency consistent with the odds that you would have calculated
based on your methods? Apparently not, which suggests that there is
something wrong with your calculation.

One possibility is that both your underlying model and your
calculation are basically correct, but that there is some other
significant factor at work (such as intervention by an intelligent
agent) which affects observed frequency of evolving lactase functions.
For obvious subjective reasons, this is the possibility you prefer
(AIU your "spare tire" argument about the ebg gene).

Another is that your underlying model is right but your calculation is
wrong, and that the actual probability (or "density in sequence
space") is in fact on the order of 1 in a few thousand. This is the
explanation you have incorrectly attributed to me, although I do
suspect that the ratio of functional sequences is somewhat higher than
you want to acknowledge.

Yet another possibility is that your underlying model is wrong, in
which case your calculations, accurate or not, are irrelevant. This
seems to me the most likely explanation, the one most consistent with
what other posters have told you about how proteins work; it is a
possibility you do not seem to ever acknowledge or really address.


> At higher and higher levels of
> complexity, the odds that a new type of beneficial function will only
> be "just 1 or 2 steps away" becomes more and more remote in an
> exponential fashion. The average distance is an extremely important
> concept. It means that just because it happened once does not mean
> that it will happen again, if ever. The averages get so enormously
> small in short order that betting that evolution will succeed over
> time becomes an exercise in insanity.

Observation seems to be at odds with all your claims and
"calculations", and require explanation. Do you have one?

RobinGoodfellow

unread,
Jan 3, 2004, 9:06:46 PM1/3/04
to
seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.03123...@posting.google.com>...

> RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...

[snip to the point where I left off]

> > Again, I totally agree that a simple Monte-Carlo
> > process is a pitiful model for evolution - but Sean's statistics are way
> > off even in when applied to this model.
>
> I fail to see how you have supported this statement of yours. My
> statistics do seem to match not only the exponentially increasing
> ratios found in language systems like English

Oh, do they now? I think it's time we laid this little baseless
assertion of yours to rest. Of course, to do that, we need to
accurately model how languages actually "evolve", rather than apply
your strawman "one letter at a time" model to the concept. We know
that new words rarely arise in a language spontaneously. This is
especially true for longer words, that have a lower "density" in what
you would call "word space". Rather, such words are often borrowed
from other languages, or made by combining standard prefixes,
suffixes, and roots. Furthermore, many English words derive from
archaic versions of themselves, which are no longer used (i.e.
"beneficial"), but certainly were so in the past. Finally, the
function of words is to convey meaning - therefore, a set of words
that clearly conveys a meaning is still functional even if some of the
words are misspelt.

So, if you wish to model language via an evolutionary process (that
bears at least some semblance to actual biological evolution) you must
support the following mutational events: 1) one-letter changes (point
mutations), 2) insertion and deletion (genomic frame shifts), 3)
recombinations of shorter words, and standard prefixes, suffixes,
e.t.c. (domain shuffles), 4) incorporation of words from other
languages (horizontal transfer), 5) word or word part duplication
(gene duplication - much more useful for biological systems than for
English language, since the latter deliberately tends to avoid
repetition). You must allow for misspellings, since they rarely alter
the meaning (function) of longer words and especially sentences, and
thus strings containing mispellings can remain perfectly functional.
Once you allow for this type of model, and give me a dictionary of
short starting functional words, I can easily generate a huge
repository of words and arbitrarily long meaningful sentences using
the above evolutionary operations, with a bare minimum of neutral
drift along the way. (e.g. I can go from "philosophy" to "teleology"
in just two simple steps, producing a new beneficial word as the
intermediary.) Now, I am not saying I can generate *any* word or
sentence that way - but I can meet any reasonable (for the English
language) complexity metric that you propose.

Note that the model I described above is a much better approximation
of biological evolution than the point mutations model that you insist
upon. And it would work on the English language quite well, despite
the fact that our language is not nearly as versatile as actual
biological systems. However, there is one essential feature of this
model that is true of biological systems as well, and that you
consistently fail to grasp, so let me spell it out: evolutionary
processes produce complex systems (i.e. words, proteins, sentences,
flagella) not by simple alterations to unrelated complex systems, but
by by incrementally combining simpler functional components, until a
solution that yields a novel beneficial function is stumbled upon.
Then, simpler adjustments (such as point mutations) can take place,
making the new system *specific* for performing the novel beneficial
function.

> and information systems
> like functional proteins and genes, but they also match statistical
> programs used in computer software development and the like. This is
> why computers cannot evolve their own software programs beyond the
> lowest levels of functional complexity. To go very far beyond the
> informational complexity that they already have they require the
> intelligence and creativity of human programmers to get across these
> vast neutral gaps that simply cannot be searched out in any sort of
> reasonable amount of time by mindless processes.

It is rather amusing how you keep misinterpreting Lenski's research.
The whole point they have demonstrated is that evolution can produce
systems of high complexity as long as there is, to use your own words,
a "ladder of complexity" to climb. That is, the point of Lenski's
demonstration was that as long as there are beneficial functions of
intermediate complexity, evolutionary processes can combine the
components performing these functions to yield qualitatively
different, novel functions of higher and higher complexity. However,
if no such functions exist (there is no ladder of complexity - all
systems below a certain level of complexity are unable to perform a
beneficial function necessary for an organism), then evolutionary
processes cannot simply poof complex systems out of thin air.
Evolution can only operate on existing functional components,
incrementally re-combining and modifying them to occasionally produce
something new: it doesn't just generate brand-new systems from
scratch. It is therefore an easily falsifiable prediction of
evolution that living organisms are not going to contain many large
functional molecules or molecular systems that look like they've been
generated completely from scratch. Care to falsify this prediction?

As for the need for intelligent input in the Lenski experiment - the
reason they needed to specify the functions of intermediate utility is
because they have been trying to attain a specific teleological goal,
and needed certain steps to reach this goal. If rather than achieving
a certain goal, they had set up their experiment so as to attain a
function of a certain level of complexity, they would not need to
specify functions at intermediate levels of complexity explicitly -
just to ensure that a sufficiently wide spectrum of such functions
exists (as it certainly does in biology). But this is a much harder
computational problem that presents interesting venues for further
ALIFE research. Which begs the question as to why no creationists or
IDers seem to be taking part in it? (to my knowledge - please correct
me if I am wrong.)

> So, please do show me
> how your Monte-Carlo technique can search an increasing state space
> and find beneficial states in a linear fashion with each increase in
> the minimum informational complexity requirement.

I never said that it could. It is you who believes that evolution
works in such fashion, not I. What I said is that your calculations
do NOT, in any way, demonstrate that it can't. They are completely
irrelevant for the problem you are trying to analyze, but you simply
do not realize it. Even if your assertions that evolution cannot
produce systems at a certain level of complexity are correct (and that
such levels of complexity are indeed regularly seen in biological
organisms), you do not have the statistics to back it up. Don't feel
too bad, though - no one does. The problem is far too complicated to
analyze statistically at the moment: it will require far greater
levels of understanding of biological systems then we currently
possess, along with far better computational tools than are currently
available. Strides are being taken in this direction, however, and
they do not bode too well for your position. I'll get to that below.

> > His probabilities would only be
> > valid if evolution worked by repeatedly generating N-amino-acid
> > sequences de novo *every time*, with selection only keeping "beneficial"
> > sequences. That is, Sean's calculation reflects the probability of
> > finding a desired state with property X by blind, uniform random
> > sampling of N-dimensional space. The best thing that could be said
> > about such an attempt to model evolution is that it is laughable.
>
> How is this laughable when you evolutionists can't seem to come up
> with any other way to explain now new types of functions that require
> at least a couple thousand fairly specified amino acids working
> together at the same time can evolve?

First of all, you haven't even come close to demonstrating that there
are functions that *require* at least a couple of thousand of *fairly
specified* working together to evolve. When presented with examples
of very large proteins evolving, you quickly (and, as far as I can
tell, correctly) pointed out that most of the amino acids are at
functionally neutral positions, and thus are allowed to vary. But
you've done nothing to demonstrate that some 2000 amino acids in the
proteins composing, say, the flagellum need to be "fairly specified".
You've tried to do something like this for cytochrome C, but even
there your analysis badly fails. You claim that a majority of
positions in Cytochrome C are constrained to only a few amino acids,
making cytochrome C relatively specific for electron transport.
However, that does not mean that the precursor protein performing the
rudimentary electron transport function required highly specific amino
acids. Rather, once rudimentary electron transport was developed in
the common ancestor, subsequent mutations produced variants that could
perform the function more efficiently. Such mutations would not be
reversible, since a reverse mutation would yield an electron
transporter of lesser efficiency, and would be selected out. As the
result, in all modern organisms, the electron transport function is
relatively constrained - though it did not need to be the case for
ancestral organisms. Note that such functions are by far the
exception rather than the rule, and there is a perfectly workable
model for explaining how they came about.

Secondly, note to what I referred to as "laughable". The preceding
sentence was
(I paraphrase) "Sean's calculation reflects the probability of finding
a desired beneficial state by blind, uniform random sampling of
N-dimensional space." That is, the search technique for which your
calculations apply is "generate a completely random N-amino acid
sequence, see if it works". But there is not even a biological
mechanism which could be used to generate long polypeptide chain
completely de novo, and do so repeatedly and at random. In other
words, you are proposing (probably without realizing it) to model
evolution using non-existant biological mechanisms. That is what I
find laughable.

> What method do you propose to
> explain such levels of functional diversity within living things? How
> do you get from one type of function at such a level to another type
> of function within this same level of specified complexity?

You generally don't. That's the key point. Complex functional
systems do not result from modification of equally complex, unrelated
functional systems. They are built up from the combination and
modification of simpler components, that already exhibit some
beneficial function for the organism. Specificity comes about even
later, as the new functions themselves start to play an important role
in the organism, and selection takes over with respect to mutations
affecting these functions.

> > But
> > it appears that Sean does not realize that he is making this mistake,
> > and keeps repeating his probabilities like a broken record.
>
> And you guys keep repeating your non-supported assertions like a
> mantra. You keep saying I'm crazy and that my ideas are laughable,
> but you have presented nothing to significantly counter my position.

I am pointing out that you are making elementary statistical errors,
and are not even realizing that you are doing so. Your ideas may be
100% percent correct for all I care, but, from a statistical point of
view, you've done nothing to back them up.

> My hypothesis remains untouched and my predictions still hold.

Actually, many posters (Ian Musgrave, sweetness, Deaddog, Howard
Hershey) have presented you with a variety of examples of evolution
that could conceivably refute your hypothesis. You have dismissed
them, identifying potential weaknesses in their arguments. However,
you've refused to apply the same criteria (e.g. count only the
conserved, "specified" amino acids, rather than the total number of
amino acids) in your so-called unevolvable systems. You've done
nothing to revise or adjust your argument in light of the evidence
you've been shown. That does not a credible position make.

> What have you presented besides a bunch of non-supported "just-so" and
> "trust me" statements? Where is your falsifiable evidence?

All around you. I gave you one above - very few large proteins are
going to look like they've been produced "de novo" (i.e. will have no
sequence or structural homologs), instead of the result of domain
shuffling, duplication and divergence, e.t.c. Care to falsify it?

As for your claims, what sort of falsifiable evidence would you accept
(since you've rejected everything presented to you so far)? Real-time
evolution of a flagellum-like motility system? Given the time scales
it is thought to take in real life, that does not seem possible. Even
if it were, we would first need to learn its exact evolutionary
history, the environmental condition under which it arose, e.t.c. In
an experiment, we would need to replicate these things precisely - and
then, what would stop you or some other IDer from claiming that it was
the intelligent input involved in the experimental setup that gave
rise to the flagellum, rather than so-called mindless processes (just
you do in the case of Lenski's research)?

> The best
> that I can see is that you guys keep falling back on the philosophical
> position that given enough time anything is possible via the
> extraordinary creativity of The Mindless - even beyond the most
> miraculous creations of mankind.

The position that given enough time anything that has a non-zero
probability of occurring will occur is not philosophical: it is
mathematically demonstrable, as you would know if you had a solid
background in probability and statitics. But that is not the
question. The question is: was there enough time, and enough raw
material to work with, for evolution to produce what we see in nature
today. If evolution worked the way you think it works, the answer
would be a most resounding "no". However, given the way evolution
actually does operate, what we have seen it (and similar processes) do
on short time scales, and what we've extrapolated to geological time
frames, it appears very likely that the answer is "yes".

> Basically evolution explains everything - even without demonstration -
> and therefore nothing.

Oh, really now? Does evolution explain why apples fall down to the
ground, and not fly straight up? Does evolution explain why covalent
bonds are stronger than ionic bonds? ID explains all those things
trivially: but evolution only explains a range of observations in
biology, and does so with as much detail as possible given the current
level of knowledge. Your claim that it does so without demonstration
is outrageous, as many aspects have indeed been demonstrated. We have
seen the evolution of novel enzymes, of small multi-protein systems,
the rise of new species, the agreement of genetic evidence with major
evolutionary predictions, the application of evolutionary to solution
of complex problems in other fields. All in a short, short span of
100 years. Just because we can't demonstrate within a span of a few
years every little thing than an evolution denier demands (especially
when such demands are completely unreasonable), doesn't mean that the
theory is broken. Apply the same standard to, say, General
Relativity, and you can toss that theory out in two seconds flat.

> It is a weak historical hypothesis at best.
> It is not falsifiable by any sort of real time genetic experiment -
> such as a Pasteur-like experiment. Every time a prediction fails, you
> evolutionists just fall back on your philosophy and say, "Oh well, I
> guess that particular level of evolution just requires millions of
> years - but it certainly happened within 4 billion years that's for
> sure!" Really, there is no way to falsify such a philosophical
> position.

Baloney. Again, above I gave you prediction of Evolution that can
easily be falsified. On a more a basic level, the absence of the twin
nested hierarchy would be a fatal blow to evolution. The lack of
mechanism to introduce novelty into the genome, or to recombine
functional genetic elements, would again be a death knell. The
preponderance of chemaeric life-forms, especially among higher
organisms, would once again pose an insurmountable obstacle to the
theory as it currently stands. These are just a few ways to falsify
evolution off the top of my head. Now, why don't you give me just one
way to falsify ID?

> Statistically you have nothing.

No. Statistically, you have nothing. Statistically, we have quite a
bit, if you bothered to look. We have population genetics models that
verify that observed frequencies of individual genes in populations
are consistent with observed evolutionary rates of change. We have
rudimentary models of protein evolution, showing how the observed
range and diversity of proteins structures is consistent with
evolutionary processes (Deeds et. al., Biophys J, 85(5):2962-72). We
have studies showing how complex biochemical networks arise in the
genome from evolutionary processes (Von Mering et. al., PNAS,
100(26):15428-33). We have real-time computational demonstration of
how complex functions can arise as the result of evolutionary change
(like Lenski's work, that you insist on misinterpreting). That, along
with hundreds of detailed evolutionary analyses of a broad spectrum of
organismal lineages, on both short and very long time scales. What we
do not yet have is a unified, computationally verifiable model of
evolution over geological timescales. Probably, we won't have such a
model for a while, since there are considerable challenges yet to
overcome - but as I said, steps are being taken in this direction. It
might even turn out (hihgly unlikely though it is) that when we get
there, and plug the model in, it is you who will turn out to be right.
But if that is the case, we won't have you to thank for it - since
neither, nor any creationist or IDer to my knowledge, have done or are
doing anything to demonstrate it.

> Statistically it is very
> clear that evolution, as an explanation for the variety and levels of
> functional complexity that we find in all living things, is simply
> untenable.

Only if you do not understand the statistics involved. Which, I am
sorry to say, you do not - as I hope I've demonstrated above, and in
my previous post.

> Of course, you are free to hold whatever philosophical
> position that you want, but if you hope to convince those who actually
> wish to consider the statistical problems involved, you will have to
> do much better than you have done so far to hold onto your illusions
> of "scientific superiority".

All I can say is that those who wish to consider the statistical
problems involved might benefit from understanding the statistical
problems involved. If you really think all you need do to model
evolutionary processes is to raise 20 to the power N, you really,
really, really, have a lot to learn.

> Sean
> www.naturalselection.0catch.com

Cheers,
RobinGoodfellow.

Sean Pitman

unread,
Jan 3, 2004, 9:18:17 PM1/3/04
to
lmuc...@yahoo.com (RobinGoodfellow) wrote in message news:<81fa9bf3.04010...@posting.google.com>...

> seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.03123...@posting.google.com>...
>
> Good gravy! That was so wrong, it feels wrong to even use the word
> "wrong" to describe it. All I can recommend is that you run, don't
> walk, to your nearest college or university, and sign up as quickly as
> you can for a few math and/or statistics courses: I especially
> recommend courses in probability theory and stochastic modelling.
> With all due respect, Sean, I am beginning to see why the biologists
> and biochemists in this group are so frustrated with you: my
> background in those fields is fairly weak - enough to find your
> arguments unconvincing but not necessarily ridiculous - but if you are
> as weak with biochemistry as you are with statistical and
> computational problems, then I can see why knowledgeable people in
> those areas would cringe at your posts.

With all due respect, what is your area of professional training? I
mean, after reading your post I dare say that you are not only weak in
biology, but statistics as well. Certainly your numbers and
calculations are correct, but the logic behind your assumptions is
extraordinarily fanciful. You sure wouldn't get away with such
assumptions in any sort of peer reviewed medical journal or other
statistically based science journal - that's for sure. Of course, you
may have good success as a novelist . . .

> I'll try to address some of the mistakes you've made below, though I
> doubt that I can do much to dispel your misconceptions. Much of my
> reply will not even concern evolution in a real sense, since I wish to
> highlight and address the mathematical errors that you are making.

What you ended up doing is highlighting your misunderstanding of
probability as it applies to this situation as well as your amazing
faith in an extraordinary stacking of the deck which allows evolution
to work as you envision it working. Certainly, if evolution is true
then you must be correct in your views. However, if you are correct
in your views as stated then it would not be evolution via mindless
processes alone, but evolution via a brilliant intelligently designed
stacking of the deck.

> > RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...
>
> > > It is even worse than that. Even random walks starting at random points
> > > in N-dimensional space can, in theory, be used to sample the states
> > > with a desired property X (such as Sean's "beneficial sequences"), even
> > > if the number of such states is exponentially small compared to the
> > > total state space size.
> >
> > This depends upon just how exponentially small the number of
> > beneficial states is relative to the state space.
>
> No, it does not. If you take away anything from this discussion, it
> has to be this: the relative number of beneficial states has virtually
> no bearing on the amount of time a local search algorithm will need to
> find such a state.

LOL - You really don't have a clue how insane this statement is?

> The things that *would* matter are the
> distribution of beneficial states through the state space, the types
> of steps the local search is allowed to take (and the probabilities
> associated with each step), and the starting point.

This distribution of states has very little if anything to do with how
much time it takes to find one of them on average. The starting point
certainly is important to initial success, but it also has very little
if anything to do with the average time needed to find more and more
beneficial functions within that same level of complexity. For
example, if all the beneficial states were clustered together in one
or two areas, the average starting point, if anything, would be
farther way than if these states were distributed more evenly
throughout the sequence space. So, this leaves the only really
relevant factor - the types of steps and the number of steps per unit
of time. That is the only really important factor in searching out
the state space - on average.

> For an extreme
> example, consider a space of strings consisting of length 1000, where
> each position can be occupied by one of 10 possible characters.

Ok. This would give you a state space of 10 to the power of 1000 or
1e1000. That is an absolutely enormous number.

> Suppose there are only two beneficial strings: ABC........, and
> BBC........ (where the dots correspond to the same characters). The
> allowed transitions between states are point mutations, that are
> equally probable for each position and each character from the
> alphabet. Suppose, furthermore, that we start at the beneficial state
> ABC. Then, the probability of a transition from ABC... to BBC... in a
> single mutation 1/(10*1000) = 1/10000 (assuming self-loops - i.e.
> mutations that do not alter the string, are allowed).

You are good so far. But, you must ask yourself this question: What
are the odds that out of a sequence space of 1e1000 the only two
beneficial sequences with uniquely different functions will have a gap
between them of only 1 in 10,000? The time required to cross this
tiny gap would require a random walk of only 10,000 steps on average.
For a decent sized population, this could be done in just one
generation.

Don't you see the problem with this little scenario of yours?
Certainly this is a common mistake made by evolutionists, but it is
none-the less a fallacy of logic. What you have done is assume that
the density of beneficial states is unimportant to the problem of
evolution since it is possible to have the beneficial states clustered
around your starting point. But such a close proximity of beneficial
states is highly unlikely. On average, the beneficial states will be
more widely distributed throughout the sequence space.

For example, say that there are 10 beneficial sequences in this
sequence space of 1e1000. Now say one of these 10 beneficial
sequences just happens to be one change away from your starting point
and so the gap is only a random walk of 10,000 steps as you calculated
above. However, on average, how long will it take to find any one of
the other 9 beneficial states? That is the real question. You rest
your faith in evolution on this inane notion that all of these states
will be clustered around your starting point. If they were, that
certainly would be a fabulous stroke of luck - like it was *designed*
that way. But, in real life, outside of intelligent design, such
strokes of luck are so remote as to be impossible for all practical
purposes. On average we would expect that the other nine sequences
would be separated from each other and our starting point by around
1e999 random walk steps/mutations (i.e., on average it is reasonable
to expect there to be around 999 differences between each of the 10
beneficial sequences). So, even if a starting sequence did happen to
be so extraordinarily lucky to be just one positional change away from
one of the "winning" sequences, the odds are that this luck will not
hold up as well in the evolution of any of the other 9 "winning"
sequences this side of a practical eternity of time.

Real time experiments support this position rather nicely. For
example, a recent and very interesting paper was published by Lenski
et. al., entitled, "The Evolutionary Origin of Complex Features" in
the 2003 May issue of Nature. In this particular experiment the
researchers studied 50 different populations, or genomes, of 3,600
individuals. Each individual began with 50 lines of code and no
ability to perform "logic operations". Those that evolved the ability
to perform logic operations were rewarded, and the rewards were larger
for operations that were "more complex". After only15,873 generations,
23 of the genomes yielded descendants capable of carrying out the most
complex logic operation: taking two inputs and determining if they are
equivalent (the "EQU" function).

In principle, 16 mutations (recombinations) coupled with the three
instructions that were present in the original digital ancestor could
have combined to produce an organism that was able to perform the
complex equivalence operation. According to the researcher themselves,
"Given the ancestral genome of length 50 and 26 possible instructions
at each site, there are ~5.6 x 10e70 genotypes [sequence space]; and
even this number underestimates the genotypic space because length
evolves."

Of course this sequence space was overcome in smaller steps. The
researchers arbitrarily defined 6 other sequences as beneficial (NAND,
AND, OR, NOR, XOR, and NOT functions). The average gap between these
pre-defined steppingstone sequences was 2.5 steps, translating into an
average search space between beneficial sequences of only 3,400 random
walk steps. Of course, with a population of 3,600 individuals in a
population, a random walk of 3,400 will be covered in short order by
at least one member of that population. And, this is exactly what
happened. The average number of mutations required to cross the
16-step gap was only 103 mutations per population.

Now that is lightening fast evolution. Certainly if real life
evolution were actually based on this sort of setup then evolution of
novel functions at all levels of complexity would be a piece of cake.
Of course, this is where most descriptions of this most interesting
experiment stop. But, what the researchers did next is the most
important part of this experiment.

Interestingly enough, Lenski and the other scientists went on to set
up different environments to see which environments would support the
evolution of all the potentially beneficial functions - to include the
most complex EQU function. Consider the following description about
what happened when various intermediate steps were not arbitrarily
defined by the scientists as "beneficial".

"At the other extreme, 50 populations evolved in an environment where
only EQU was rewarded, and no simpler function yielded energy. We
expected that EQU would evolve much less often because selection would
not preserve the simpler functions that provide foundations to build
more complex features. Indeed, none of these populations evolved EQU,
a highly significant difference from the fraction that did so in the
reward-all environment (P = 4.3 x 10e-9, Fisher's exact test).
However, these populations tested more genotypes, on average, than did
those in the reward-all environment (2.15 x 10e7 versus 1.22 x 10e7;
P<0.0001, Mann-Witney test), because they tended to have smaller
genomes, faster generations, and thus turn over more quickly. However,
all populations explored only a tiny fraction of the total genotypic
space. Given the ancestral genome of length 50 and 26 possible
instructions at each site, there are ~5.6 x 10e70 genotypes; and even
this number underestimates the genotypic space because length
evolves."

Isn't that just fascinating? When the intermediate stepping stone
functions were removed, the neutral gap that was created successfully
blocked the evolution of the EQU function, which happened *not* to be
right next door to their starting point. Of course, this is only to
be expected based on statistical averages that go strongly against the
notion that very many possible starting points would just happen to be
very close to an EQU functional sequence in such a vast sequence
space.

Now, isn't this consistent with my predictions? This experiment was
successful because the intelligent designers were capable to defining
what sequences were "beneficial" for their evolving "organisms." If
enough sequences are defined as beneficial and they are placed in just
the right way, with the right number of spaces between them, then
certainly such a high ratio will result in rapid evolution - as we saw
here. However, when neutral non-defined gaps are present, they are a
real problem for evolution. In this case, a gap of just 16 neutral
mutations effectively blocked the evolution of the EQU function.

http://naturalselection.0catch.com/Files/computerevolution.html

> Thus, a random
> walk that restarts each time after the first step (or alternatively, a
> random walk performed by a large population of sequences, each
> starting at state ABC...) is expected to explore, on average, 10000
> states before finding the next beneficial sequence.

Yes, but you are failing to consider the likelihood that your "winning
sequence" will in fact be within these 10,000 steps on average.

> Now, below, we
> will apply your model to the same problem.

Oh, I can hardly wait!

> > It also depends
> > upon how fast this space is searched through. For example, if the
> > ratio of beneficial states to non-beneficial states is as high as say,
> > 1 in a 1e12, and if 1e9 states are searched each second, how long with
> > it take, on average, to find a new beneficial state?
>
> OK. Let's take my example, instead, and apply your calculations.
> There are only 2 beneficial sequences, out of the state space of
> 1e1000 sequences.

Ok, I'm glad that you at least realize the size of the state space.

> Since the ratio of beneficial sequences to
> non-beneficial ones is (2/10^1000), if your "statistics" are correct,
> then I should be exploring 10^1000/2 states, on average, before
> finding the next beneficial state. That is a huge, huge, huge number.
> So why does my very simple random walk explore only 10,000 states,
> when the ratio of beneficial sequences is so small?

Yes, that is the real question and the answer is very simple - You
either got unbelievably lucky in the positioning of your start point
or your "beneficial" sequences were clustered by intelligent design.

> The answer is simple - the ratio of beneficial states does NOT matter!

Yes it does. You are ignoring the highly unlikely nature of your
scenario. Tell me, how often do you suppose your start point would
just happen to be so close to the only other beneficial sequence in
such a huge sequence space? Hmmmm? I find it just extraordinary that
you would even suggest such a thing as "likely" with all sincerity of
belief. The ratio of beneficial to non-beneficial in your
hypothetical scenario is absolutely miniscule and yet you still have
this amazing faith that the starting point will most likely be close
to the only other "winning" sequence in an absolutely enormous
sequence space?! Your logic here is truly mysterious and your faith
is most impressive. I'm sorry, but I just can't get into that boat
with you. You are simply beyond me.

> All that matters is their distribution, and how well a particular
> random walk is suited to explore this distribution.

Again, you must consider the odds that your "distribution" will be so
fortuitous as you seem to believe it will be. In fact, it has to be
this fortuitous in order to work. It basically has to be a set up for
success. The deck must be stacked in an extraordinary way in your
favor in order for your position to be tenable. If such a stacked
deck happened at your table in Las Vegas you would be asked to leave
the casino in short order or be arrested for "cheating" by intelligent
design since such deck stacking only happens via intelligent design.
Mindless processes cannot stack the deck like this. It is
statistically impossible - for all practical purposes.

> (Again, it is a
> gross, meaningless over-simplification to model evolution as a random
> walk over a frozen N-dimensional sequence space, but my point is that
> your calculations are wrong even for that relatively simple model.)

Come now Robin - who is trying to stack the deck artificially in their
own favor here? My calculations are not based on the assumption of a
stacked deck like your calculations are, but upon a more likely
distribution of beneficial sequences in sequence space. The fact of
the matter is that sequence space does indeed contain vastly more
absolutely non-beneficial sequences than it does those that are even
remotely beneficial. In fact, there is an entire theory called the
"Neutral Theory of Evolution". Of all mutations that occur in every
generation in say, humans (around 200 to 300 per generation), the
large majority of them are completely "neutral" and those few that are
functional are almost always detrimental. This ratio of beneficial to
non-beneficial is truly small and gets exponentially smaller with each
step up the ladder of specified functional complexity. Truly,
evolution gets into very deep weeds very quickly beyond the lowest
levels of functional/informational complexity.

> > It will take
> > just over 1,000 seconds - a bit less than 20 minutes on average. But,
> > what happens if at higher levels of functional complexity the density
> > of beneficial functions decreases exponentially with each step up the
> > ladder? The rate of search stays the same, but the junk sequences
> > increase exponentially and so the time required to find the rarer and
> > rarer beneficial states also increases exponentially.
>
> The above is only true if you use the following search algorithm:
>
> 1. Generate a completely random N-character sequence
> 2. If the sequence is beneficial, say "OK";
> Otherwise, go to step 1.

Actually the above is also true if you start with a likely starting
point. A likely starting point will be an average distance away from
the next closest beneficial sequence. A random mutation to a sequence
that does not find the new beneficial sequence will not be selectable
as advantageous and a random walk will begin.

> For an alphabet of size S, where only k characters are "beneficial"
> for each position, the above search algorithm will indeed need to explore
> exponentially many states in N (on average, (S/k)^N), before finding a
> beneficial state. But, this analysis applies only to the above search
> algorithm - an exteremely naive approach that resembles nothing that
> is going on in nature.

Oh really? How do you propose that nature gets around this problem?
How does nature stack the deck so that its starting point is so close
to all the beneficial sequences that otherwise have such a low density
in sequence space?

> The above algorithm isn't even a random walk
> per se, since random walks make local modifications to the current
> state, rather than generate entire states anew.

The random walk I am talking about does indeed make local
modifications to a current sequence. However, if you want to get from
the type of function produced by one state to a new type of function
produced by a different state/sequence, you will need to eventually
leave your first state and move onto the next across whatever neutral
gap there might be in the way. If a new function requires a sequence
that does not happen to be as fortuitously close to your starting
sequence as you like to imagine, then you might be in just a bit of a
pickle. Please though, do explain to me how it is so easy to get from
your current state, one random walk step at a time, to a new state
with a new type of function when the density of beneficial sequences
of the new type of function are extraordinarily infinitesimal?

> A random walk
> starting at a given beneficial sequence, and allowing certain
> transitions from one sequence to another, would require a completely
> different type of analysis. In the analyses of most such search
> algorithms, the "ratio" of beneficial sequences would be irrelevant -
> it is their *distribution* that would determine how well such an
> algorithm would perform.

The most likely distribution of beneficial sequences is determined by
their density/ratio. You cannot simply assume that the deck will be
so fantastically stacked in the favor of your neat little evolutionary
scenario. I mean really, if the deck was stacked like this with lots
of beneficial sequences neatly clustered around your starting point,
evolution would happen very quickly. Of course, there have been those
who propose the "Baby Bear Hypothesis". That is, the clustering is
"just right" so that the theory of evolution works. That is the best
you can hope for. Against all odds the deck was stacked just right so
that we can still believe in evolution. Well, if this were the case
then it would still be evolution by design. Mindless processes just
can't stack the deck like you are proposing.

> My example above demonstrates a problem
> where the ratio of beneficial states is exteremely tiny, yet the
> search finds a new beneficial state relatively quickly.

Yes - because you stacked the deck in your favor via deliberate
design. You did not even try to explain the likelihood of this
scenario in real life. How do you propose that this is even a remote
reflection of what mindless processes are capable of? I'm talking
average probabilities here while you are talking about extraordinarily
unlikely scenarios that are basically impossible outside of deliberate
design.

> I could also
> very easily construct an example where the ratio is nearly one, yet a
> random walk starting at a given beneficial sequence would stall with a
> very high probability.

Oh really? You can construct a scenario where all sequences are
beneficial and yet evolution cannot evolve a new one? Come on now . .
. now you're just being silly. But I certainly would like to see you
try and set up such a scenario. I think it would be most
entertaining.

> In other words, Sean, your calculations are
> irrelevant for the kind of problem you are trying to analyze.

Only if you want to bury your head in the sand and force yourself to
believe in the fairytale scenarios that you are trying to float.

> If you
> wish to model evolution as a random walk of point mutations on a
> frozen N-dimensional sequence space, you will need to apply a totally
> different statististical analysis: one that takes into account the
> distributions of known "beneficial" sequences in sequence space. And
> then I'll tell you why that model too is so wrong as to be totally
> irrelevant.

And if you wish to model evolution as a walk between tight clusters of
beneficial sequences in an otherwise extraordinarily low density
sequence space, then I have some oceanfront property in Arizona to
sell you at a great price.

Until then, this is all I have time for today.

> Cheers,
> RobinGoodfellow.

Sean
www.naturalselection.0catch.com

RobinGoodfellow

unread,
Jan 4, 2004, 3:15:15 AM1/4/04
to
Sean Pitman wrote:

> lmuc...@yahoo.com (RobinGoodfellow) wrote in message news:<81fa9bf3.04010...@posting.google.com>...
>
>>seanpi...@naturalselection.0catch.com (Sean Pitman) wrote in message news:<80d0c26f.03123...@posting.google.com>...
>>

[snip]

> With all due respect, what is your area of professional training? I
> mean, after reading your post I dare say that you are not only weak in
> biology, but statistics as well. Certainly your numbers and
> calculations are correct, but the logic behind your assumptions is
> extraordinarily fanciful. You sure wouldn't get away with such
> assumptions in any sort of peer reviewed medical journal or other
> statistically based science journal - that's for sure. Of course, you
> may have good success as a novelist . . .

Tsk, tsk... I thank you for the career advice. I'll keep it in mind,
should my current stint in computer science fall through. I wouldn't go
so far as to say that Monte-Carlo methods are my specialty, but I will
say that my own research and the research of half my colleagues would be
non-existent if they worked the way you think they do.

>>I'll try to address some of the mistakes you've made below, though I
>>doubt that I can do much to dispel your misconceptions. Much of my
>>reply will not even concern evolution in a real sense, since I wish to
>>highlight and address the mathematical errors that you are making.
>
>
> What you ended up doing is highlighting your misunderstanding of
> probability as it applies to this situation as well as your amazing
> faith in an extraordinary stacking of the deck which allows evolution
> to work as you envision it working. Certainly, if evolution is true
> then you must be correct in your views. However, if you are correct
> in your views as stated then it would not be evolution via mindless
> processes alone, but evolution via a brilliant intelligently designed
> stacking of the deck.

Exactly what views did I state, Sean? Other than that your calculations
are, to put it plainly, irrelevant. Not even wrong - just irrelevant.

Yes, the example I give below incredibly stacks the deck in my favor.
It ought to. It is what is called a "counter-example". It falsifies
the hypothesis that your "model" of evolution is correct. Now aren't
you glad you proposed something falsifiable?

>
>>>RobinGoodfellow <lmuc...@yahoo.com> wrote in message news:<bsd7ue$r1c$1...@news01.cit.cornell.edu>...
>>
>>
>>
>>>>It is even worse than that. Even random walks starting at random points
>>>>in N-dimensional space can, in theory, be used to sample the states
>>>>with a desired property X (such as Sean's "beneficial sequences"), even
>>>>if the number of such states is exponentially small compared to the
>>>>total state space size.
>>>
>>>This depends upon just how exponentially small the number of
>>>beneficial states is relative to the state space.
>>
>>No, it does not. If you take away anything from this discussion, it
>>has to be this: the relative number of beneficial states has virtually
>>no bearing on the amount of time a local search algorithm will need to
>>find such a state.
>
>
> LOL - You really don't have a clue how insane this statement is?

When you're done laughing, would you care to explain to me why it is
insane? Especially when I can construct examples (and, if you so wish,
give you examples of real-world problems) that show this statement is true?

>>The things that *would* matter are the
>>distribution of beneficial states through the state space, the types
>>of steps the local search is allowed to take (and the probabilities
>>associated with each step), and the starting point.
>
> This distribution of states has very little if anything to do with how
> much time it takes to find one of them on average. The starting point
> certainly is important to initial success, but it also has very little
> if anything to do with the average time needed to find more and more
> beneficial functions within that same level of complexity.

Except in every real example of a working Monte-Carlo procedure, where
the distribution and starting point have *everything* to do whether such
a procedure is successful or not.

> For
> example, if all the beneficial states were clustered together in one
> or two areas, the average starting point, if anything, would be
> farther way than if these states were distributed more evenly
> throughout the sequence space. So, this leaves the only really
> relevant factor - the types of steps and the number of steps per unit
> of time. That is the only really important factor in searching out
> the state space - on average.

*Sigh*. The problem is that the model *you* are proposing (one I think
is silly) is of a random on walk on a specific frozen sequence space
with beneficial sequences as points in that space. It does not deal
with an "average" distribution, and an "average" starting point, but
with one very specific distribution of beneficial sequences and one very
specific starting point. You cannot simply assume an "average"
distribution in the absence of background information: you have to find
out precisely the kind of distribution you are dealing with. And even
if you do find that the distribution is "stacked", it does not imply
that an intelligence was involved. The stacking could occur due to the
constraints imposed by the very definition of the problem: in the case
of evolutions, by the physical constraints governing the interactions
between the molecules involved in biological systems. In fact, why
would you expect that the regular and highly predictable physical laws
governing biochemical reactions would produce a random, "average"
distribution of "beneficial sequences"?

>>For an extreme
>>example, consider a space of strings consisting of length 1000, where
>>each position can be occupied by one of 10 possible characters.

Note, I wrote, "extereme example". My point was *not* invent a
distribution which makes it likely for evolutiuon to occur (this example
has about as much to do with evolution as ballet does with quantum
mechanics), but to show how inadequate your methods are.

>
> Ok. This would give you a state space of 10 to the power of 1000 or
> 1e1000. That is an absolutely enormous number.
>
>
>>Suppose there are only two beneficial strings: ABC........, and
>>BBC........ (where the dots correspond to the same characters). The
>>allowed transitions between states are point mutations, that are
>>equally probable for each position and each character from the
>>alphabet. Suppose, furthermore, that we start at the beneficial state
>>ABC. Then, the probability of a transition from ABC... to BBC... in a
>>single mutation 1/(10*1000) = 1/10000 (assuming self-loops - i.e.
>>mutations that do not alter the string, are allowed).
>
>
> You are good so far. But, you must ask yourself this question: What
> are the odds that out of a sequence space of 1e1000 the only two
> beneficial sequences with uniquely different functions will have a gap
> between them of only 1 in 10,000?

Mind-numbingly low. 1000*.9*.1^999, to be precise. But that is not the
point.

> The time required to cross this
> tiny gap would require a random walk of only 10,000 steps on average.
> For a decent sized population, this could be done in just one
> generation.

> Don't you see the problem with this little scenario of yours?
> Certainly this is a common mistake made by evolutionists, but it is
> none-the less a fallacy of logic. What you have done is assume that
> the density of beneficial states is unimportant to the problem of
> evolution since it is possible to have the beneficial states clustered
> around your starting point. But such a close proximity of beneficial
> states is highly unlikely. On average, the beneficial states will be
> more widely distributed throughout the sequence space.

On average, yes. But didn't you just say above that the distribution
of the sequences is irrelevant? That all that matters is "ratio" of
beneficial sequences? (Incidentally, "ratio" and "density" are not
identical. The distribution I showed you has a relatively high density
of beneficial sequences, despite a low ratio.)

> For example, say that there are 10 beneficial sequences in this
> sequence space of 1e1000. Now say one of these 10 beneficial
> sequences just happens to be one change away from your starting point
> and so the gap is only a random walk of 10,000 steps as you calculated
> above. However, on average, how long will it take to find any one of
> the other 9 beneficial states? That is the real question. You rest
> your faith in evolution on this inane notion that all of these states
> will be clustered around your starting point. If they were, that
> certainly would be a fabulous stroke of luck - like it was *designed*
> that way. But, in real life, outside of intelligent design, such
> strokes of luck are so remote as to be impossible for all practical
> purposes. On average we would expect that the other nine sequences
> would be separated from each other and our starting point by around
> 1e999 random walk steps/mutations (i.e., on average it is reasonable
> to expect there to be around 999 differences between each of the 10
> beneficial sequences). So, even if a starting sequence did happen to
> be so extraordinarily lucky to be just one positional change away from
> one of the "winning" sequences, the odds are that this luck will not
> hold up as well in the evolution of any of the other 9 "winning"
> sequences this side of a practical eternity of time.

Unless, of course, it follows from the properties of the problem that
the other 9 benefecial sequences must be close to the starting sequence.

> Real time experiments support this position rather nicely. For
> example, a recent and very interesting paper was published by Lenski
> et. al., entitled, "The Evolutionary Origin of Complex Features" in
> the 2003 May issue of Nature. In this particular experiment the
> researchers studied 50 different populations, or genomes, of 3,600
> individuals. Each individual began with 50 lines of code and no
> ability to perform "logic operations". Those that evolved the ability
> to perform logic operations were rewarded, and the rewards were larger
> for operations that were "more complex". After only15,873 generations,
> 23 of the genomes yielded descendants capable of carrying out the most
> complex logic operation: taking two inputs and determining if they are
> equivalent (the "EQU" function).

I've already covered how you've completely misinterpreted Lenski's
research in the other post. But let's run with this for a bit:

> In principle, 16 mutations (recombinations) coupled with the three
> instructions that were present in the original digital ancestor could
> have combined to produce an organism that was able to perform the
> complex equivalence operation. According to the researcher themselves,
> "Given the ancestral genome of length 50 and 26 possible instructions
> at each site, there are ~5.6 x 10e70 genotypes [sequence space]; and
> even this number underestimates the genotypic space because length
> evolves."
>
> Of course this sequence space was overcome in smaller steps. The
> researchers arbitrarily defined 6 other sequences as beneficial (NAND,
> AND, OR, NOR, XOR, and NOT functions).

As a minor quibble, I believe they actually started with NAND (you need
it for all the other functions). But I could be wrong - I've read that
paper months ago.

And after years of painstaking research, Sean finally invents the wheel.
Yes, evolution does not pop complex systems out of thin air, but
constructs through integration and co-optation of simpler functional
components. Move along, folks, nothing to see here!

> Isn't that just fascinating? When the intermediate stepping stone
> functions were removed, the neutral gap that was created successfully
> blocked the evolution of the EQU function, which happened *not* to be
> right next door to their starting point. Of course, this is only to
> be expected based on statistical averages that go strongly against the
> notion that very many possible starting points would just happen to be
> very close to an EQU functional sequence in such a vast sequence
> space.

Here's a question for you. There were only 5 beneficial functions in
that big old sequence space of yours. They are all very standard
Boolean functions: in no way were they specifically designed by Lenski
et. al. to ease the way to into evolving the EQ functions. How come
they were all sufficiently close in sequence space to one another, when
according to you such a thing is so highly improbable?

> Now, isn't this consistent with my predictions? This experiment was
> successful because the intelligent designers were capable to defining
> what sequences were "beneficial" for their evolving "organisms." If
> enough sequences are defined as beneficial and they are placed in just
> the right way, with the right number of spaces between them, then
> certainly such a high ratio will result in rapid evolution - as we saw
> here. However, when neutral non-defined gaps are present, they are a
> real problem for evolution. In this case, a gap of just 16 neutral
> mutations effectively blocked the evolution of the EQU function.

You are not even close. Lenski et. al. didn't define which *sequences*
were "beneficial". They didn't even design functions to serve
specifically as stepping stones in the evolutionary pathways of EQ.
What they have done is to name some functions of intermediate complexity
that might be beneficial to the organism. They certainly did not tell
their program how to reach these functions, or what the systems
performing these functions might look like, but simply indicated that
there are functions at varying levels of complexity that might be useful
to an organism in its environment. Thus, they have demonstrated exactly
what they set out to: that in evolution, complex functional features are
acquired through co-optation and modification of simpler ones.

> http://naturalselection.0catch.com/Files/computerevolution.html

Thanks, but when I'm in the mood for a laugh, I prefer The Onion,
talk.origins feedback pages, or Fox News. :)

>> Thus, a random
>>walk that restarts each time after the first step (or alternatively, a
>>random walk performed by a large population of sequences, each
>>starting at state ABC...) is expected to explore, on average, 10000
>>states before finding the next beneficial sequence.
>
>
> Yes, but you are failing to consider the likelihood that your "winning
> sequence" will in fact be within these 10,000 steps on average.
>
>>Now, below, we
>>will apply your model to the same problem.
>
>
> Oh, I can hardly wait!
>
>
>>>It also depends
>>>upon how fast this space is searched through. For example, if the
>>>ratio of beneficial states to non-beneficial states is as high as say,
>>>1 in a 1e12, and if 1e9 states are searched each second, how long with
>>>it take, on average, to find a new beneficial state?
>>
>>OK. Let's take my example, instead, and apply your calculations.
>>There are only 2 beneficial sequences, out of the state space of
>>1e1000 sequences.
>
>
> Ok, I'm glad that you at least realize the size of the state space.

Yes, Sean, because your statistical argument is so-oooo sophisticated
that we simple folk can't keep up...

>
>>Since the ratio of beneficial sequences to
>>non-beneficial ones is (2/10^1000), if your "statistics" are correct,
>>then I should be exploring 10^1000/2 states, on average, before
>>finding the next beneficial state. That is a huge, huge, huge number.
>>So why does my very simple random walk explore only 10,000 states,
>>when the ratio of beneficial sequences is so small?
>
>
> Yes, that is the real question and the answer is very simple - You
> either got unbelievably lucky in the positioning of your start point
> or your "beneficial" sequences were clustered by intelligent design.

But, Sean, I don't understand! You were telling me just above that the
distribution doesn't matter at all! I am applying your very rigorous,
unquestionably correct method for computing the average number of states
examined (that should work regardless of distribution and starting
point), and it tells me I should be examining 10^1000/2 states on
average. So why on earth am I examining only 10,000? Is it just
remotely possible that the distribution, and *not* the ratio, might be
what is playing the deciding role?

Once you say "yes", then you and I can talk what an average distribution
will look like, and whether question of "average" is relevant or not.
But if you say "no", please tell me why your calculation fails so
miserably for my counter-example.

>>The answer is simple - the ratio of beneficial states does NOT matter!
>
>
> Yes it does. You are ignoring the highly unlikely nature of your
> scenario. Tell me, how often do you suppose your start point would
> just happen to be so close to the only other beneficial sequence in
> such a huge sequence space? Hmmmm? I find it just extraordinary that
> you would even suggest such a thing as "likely" with all sincerity of
> belief.

And have I done so? Though, now that you mention it, it may very well
be likely, and in fact even necessary, depending on the nature of the
problem we are examining. (And again, please remember that my toy
example has absolutely nothing to do with biological evolution - I am
just pointing out the general inadequacy of your methodology.)

> The ratio of beneficial to non-beneficial in your
> hypothetical scenario is absolutely miniscule and yet you still have
> this amazing faith that the starting point will most likely be close
> to the only other "winning" sequence in an absolutely enormous
> sequence space?! Your logic here is truly mysterious and your faith
> is most impressive. I'm sorry, but I just can't get into that boat
> with you. You are simply beyond me.

I am glad that I possess so much mystique in your mind's eye. :) But,
again, the purpose of my example was to blow a hole in your probability
calculations, rather than to present a workable scenario of evolution.
All I was trying to argue with this example is that your math needs a
lot of work.

>> All that matters is their distribution, and how well a particular
>>random walk is suited to explore this distribution.
>
>
> Again, you must consider the odds that your "distribution" will be so
> fortuitous as you seem to believe it will be. In fact, it has to be
> this fortuitous in order to work.

Again, I can present you with examples of real world problems where
these distributions just happen to be this fortuitious. If they
weren't, then Monte-Carlo methods would be useless in solving them.
Remember, these distributions don't arise at random, they follow
necessarily from the properties of the problem. So your arguments about
"averages" don't apply here.

> It basically has to be a set up for
> success. The deck must be stacked in an extraordinary way in your
> favor in order for your position to be tenable. If such a stacked
> deck happened at your table in Las Vegas you would be asked to leave
> the casino in short order or be arrested for "cheating" by intelligent
> design since such deck stacking only happens via intelligent design.
> Mindless processes cannot stack the deck like this. It is
> statistically impossible - for all practical purposes.
>
>>(Again, it is a
>>gross, meaningless over-simplification to model evolution as a random
>>walk over a frozen N-dimensional sequence space, but my point is that
>>your calculations are wrong even for that relatively simple model.)
>
>
> Come now Robin - who is trying to stack the deck artificially in their
> own favor here? My calculations are not based on the assumption of a
> stacked deck like your calculations are, but upon a more likely
> distribution of beneficial sequences in sequence space. The fact of
> the matter is that sequence space does indeed contain vastly more
> absolutely non-beneficial sequences than it does those that are even
> remotely beneficial.

Yes, but your caclulations are based on the equally unfounded assumption
that the deck is not stacked in any way, shape, or form. (That is, if
the sequences were really distributed evenly in your frozen sequence
space, then your probability calculation would still be off, but not by
too much.) What makes you think that the laws of physics do not stack
the deck sufficiently to make evolution possible? You may feel that
they can't: but in the meantime, you should be striving to find out what
the actual distribution is, rather than assuming it is unstacked. (Not
that this would make your model relevant, but it'll be a small step in
the right direction.)

> In fact, there is an entire theory called the
> "Neutral Theory of Evolution". Of all mutations that occur in every
> generation in say, humans (around 200 to 300 per generation), the
> large majority of them are completely "neutral" and those few that are
> functional are almost always detrimental. This ratio of beneficial to
> non-beneficial is truly small and gets exponentially smaller with each
> step up the ladder of specified functional complexity. Truly,
> evolution gets into very deep weeds very quickly beyond the lowest
> levels of functional/informational complexity.

The fact that the vast majority of mutations are neutral does not imply
that there exists any point where there is no opportunity for a
beneficial mutation. And where such an opportunity presents itself,
evolution will eventually find it, given large enough populations and
sufficient times.

>>>It will take
>>>just over 1,000 seconds - a bit less than 20 minutes on average. But,
>>>what happens if at higher levels of functional complexity the density
>>>of beneficial functions decreases exponentially with each step up the
>>>ladder? The rate of search stays the same, but the junk sequences
>>>increase exponentially and so the time required to find the rarer and
>>>rarer beneficial states also increases exponentially.
>>
>>The above is only true if you use the following search algorithm:
>>
>> 1. Generate a completely random N-character sequence
>> 2. If the sequence is beneficial, say "OK";
>> Otherwise, go to step 1.
>
>
> Actually the above is also true if you start with a likely starting
> point. A likely starting point will be an average distance away from
> the next closest beneficial sequence. A random mutation to a sequence
> that does not find the new beneficial sequence will not be selectable
> as advantageous and a random walk will begin.

Actually, your last paragraph will be approximately true only if all
your "beneficial" points are uniformly spread out through your sequence
space. Even then, you probability calculation will be off by some
orders of magnitude, since you will actually need to apply combinatorial
forumlas to compute these probabilities correctly. But, I suppose,
it'll be close enough.

>
>>For an alphabet of size S, where only k characters are "beneficial"
>>for each position, the above search algorithm will indeed need to explore
>>exponentially many states in N (on average, (S/k)^N), before finding a
>>beneficial state. But, this analysis applies only to the above search
>>algorithm - an exteremely naive approach that resembles nothing that
>>is going on in nature.
>
>
> Oh really? How do you propose that nature gets around this problem?
> How does nature stack the deck so that its starting point is so close
> to all the beneficial sequences that otherwise have such a low density
> in sequence space?

OK, Sean. Pause for a second. Do you really believe that evolution
works by repeatedly generating random long nucleotide sequences *de
novo*? Yes or no? That is the algorithm I was describing above.

>>The above algorithm isn't even a random walk
>>per se, since random walks make local modifications to the current
>>state, rather than generate entire states anew.
>
>
> The random walk I am talking about does indeed make local
> modifications to a current sequence. However, if you want to get from
> the type of function produced by one state to a new type of function
> produced by a different state/sequence, you will need to eventually
> leave your first state and move onto the next across whatever neutral
> gap there might be in the way.

If any. Depending on the distribution of states in sequence space, none
may exist.

> If a new function requires a sequence
> that does not happen to be as fortuitously close to your starting
> sequence as you like to imagine, then you might be in just a bit of a
> pickle. Please though, do explain to me how it is so easy to get from
> your current state, one random walk step at a time, to a new state
> with a new type of function when the density of beneficial sequences
> of the new type of function are extraordinarily infinitesimal?

Because the *density* need not be infinitesimal. Locally, the density
can be quite high. Again, what exactly, is your argument against the
idea that all the beneficial sequences observed in nature would be
necessarily clustered relatively close together in sequence space, as
required by biochemicastry? Or do you have a compelling argument that
this distribution should be purely random?

>>A random walk
>>starting at a given beneficial sequence, and allowing certain
>>transitions from one sequence to another, would require a completely
>>different type of analysis. In the analyses of most such search
>>algorithms, the "ratio" of beneficial sequences would be irrelevant -
>>it is their *distribution* that would determine how well such an
>>algorithm would perform.
>
>
> The most likely distribution of beneficial sequences is determined by
> their density/ratio. You cannot simply assume that the deck will be
> so fantastically stacked in the favor of your neat little evolutionary
> scenario. I mean really, if the deck was stacked like this with lots
> of beneficial sequences neatly clustered around your starting point,
> evolution would happen very quickly. Of course, there have been those
> who propose the "Baby Bear Hypothesis". That is, the clustering is
> "just right" so that the theory of evolution works. That is the best
> you can hope for.

Or it would be, if I thought the model you propose was even remotely
realistic. But, here are a few hints for you: 1) the "sequence space"
does not have a fixed dimension; 2) the mutli-dimensional fitness
landscape changes over time, partially as the result of evolutionary
processes. If your statistics are inadequate even for your very simple
model, how can you expect them to be even remotely relevant for a
problem that is much, much more complicated?

> Against all odds the deck was stacked just right so
> that we can still believe in evolution. Well, if this were the case
> then it would still be evolution by design. Mindless processes just
> can't stack the deck like you are proposing.

Really, now? I would think that processes operating with predictable
regularity might be able to. Such as, say, the laws of physics.
Remember: "mindless" != "random".

>
>>My example above demonstrates a problem
>>where the ratio of beneficial states is exteremely tiny, yet the
>>search finds a new beneficial state relatively quickly.
>
>
> Yes - because you stacked the deck in your favor via deliberate
> design. You did not even try to explain the likelihood of this
> scenario in real life. How do you propose that this is even a remote
> reflection of what mindless processes are capable of? I'm talking
> average probabilities here while you are talking about extraordinarily
> unlikely scenarios that are basically impossible outside of deliberate
> design.

And I am saying average probabilities do not apply when you are
concerned with determining one particular distribution. And as my
example shows, the distribution is all that matters. Find the
distribution, if you can, and then we'll talk.

>> I could also
>>very easily construct an example where the ratio is nearly one, yet a
>>random walk starting at a given beneficial sequence would stall with a
>>very high probability.
>
>
> Oh really? You can construct a scenario where all sequences are
> beneficial and yet evolution cannot evolve a new one? Come on now . .
> . now you're just being silly. But I certainly would like to see you
> try and set up such a scenario. I think it would be most
> entertaining.

I didn't say all sequences are beneficial, Sean. That *would* be silly.
I did say that the ratio *approaches* one, but is not quite that.
But, here you are:

Same "sequence space" as before, but now a sequence is "beneficial" if
it is AAAAAAAAAA......AAA (all A's), or it differs from AAAAA...AAA by
at least 2 amino acids. All other sequences are *harmful* - if the
random walk ever stumbles onto one, it will die off, and will need to
return to its starting point. (This means there are exactly 1000*9 +
(1000*999/2)*81 or about 4.02e6 harmful sequences, and 1e1000-4.02e6 or
about 1e1000 beneficial sequences: that is, virtually every sequence is
beneficial.) Again, the allowed transitions are point mutations, and
the starting point is none other AAAAAAA...AAA. Now, will this random
walk ever find another beneficial sequence?

What does this have to do with evolution? Nothing. But everything to
do with how a distribution can effect a random walk.

>>In other words, Sean, your calculations are
>>irrelevant for the kind of problem you are trying to analyze.
>
>
> Only if you want to bury your head in the sand and force yourself to
> believe in the fairytale scenarios that you are trying to float.
>
>>If you
>>wish to model evolution as a random walk of point mutations on a
>>frozen N-dimensional sequence space, you will need to apply a totally
>>different statististical analysis: one that takes into account the
>>distributions of known "beneficial" sequences in sequence space. And
>>then I'll tell you why that model too is so wrong as to be totally
>>irrelevant.
>
>
> And if you wish to model evolution as a walk between tight clusters of
> beneficial sequences in an otherwise extraordinarily low density
> sequence space, then I have some oceanfront property in Arizona to
> sell you at a great price.

If I did wish to model evolution this way, then I would gladly buy this
property off your hands. And then sell it back to you at twice the
price, because it would still be better than the model you propose.

> Until then, this is all I have time for today.
>
>
>
>

> Sean
> www.naturalselection.0catch.com

Cheers,
RobinGoodfellow.

Dunk

unread,
Jan 4, 2004, 9:28:59 AM1/4/04
to


Major kudos to Von Smith for his ** format ** as well as content.

The intercalated style of argumentative reply nearly doubles in length
with each reply. This is unnecessary, since essentially the same
argument recurs, ah, more than twice.

Von Smith's 'summary of the main argument' format is very
constructive, and likely to get somewhere sooner.

Dunk

Howard Hershey

unread,
Jan 4, 2004, 10:12:07 AM1/4/04
to

Sean Pitman wrote:
>
> Howard Hershey <hers...@indiana.edu> wrote in message news:<3FF4957C...@indiana.edu>...
>
> > 1) You seem to agree that the native ebga does not have any selectable
> > lactase activity. Thus generating selectable lactase activity from ebg
> > is generating a 'new' function. Is that right?
>
> Yes.
>
> > 2) You do agree that the native ebg involves a two peptide system, with
> > ebga being 1030 amino acids long and ebgc being 149 amino acids long,
> > both being requried for function, plus a regulatory protein (ebgr) which
> > is also around 1000 amino acids long.
>
> At this point it might be helpful to consider that the usual wild type
> lacZ genes in E. coli produce a tetramer beta-galactosidase. Each
> subunit of this tetramer is around 1000aa in size.

The subunits of the alpha and beta units of ebg are 1030 and 149 amino
acids. It also forms a multimeric enzyme. Any statement about the
complexity of the standard w.t. lacZ gene also holds for ebg, doesn't
it? Aren't they at the same level of complexity, with perhaps a nod to
ebg (because it is slightly longer and composed of two peptides)?

> However, this is
> not the minimum size requirement for this type of function to be
> realized at a beneficial level of selectability. The minimum size
> requirement seems to be well over 400aa.

So you keep asserting without telling anyone how you poofed that number
out of thin air.

> Considering that 12 to 14 of
> 15 active site residues are identical between LacZ and ebgA, I would
> also think that the minimum sequence requirements would also be
> similar (i.e., somewhere around 400aa). Also, it is interesting to
> note that the ebgC sequence has none of the active site residues and
> yet it seems to be essential, as you noted yourself, for the lactase
> function.

This would not be surprizing to anyone with a knowledge of how the alpha
peptide of the lacZ gene can complement a non-functional lacZ missing
the alpha peptide region. BTW, the alpha peptide complementation of
lacZ is the basis of many of those blue/white tests to identify bacteria
with plasmids having DNA inserts.

> It appears that this small subunit is essential for the
> optimal operation of electrophilic catalysis by the active-site Mg^2+.
> Also note that a correct mutation in either the ebgR or the ebgA
> genes alone will allow selectably advantageous lactase ability. Of
> course both mutations occurring at the same time allow for a much
> stronger lactase function, but both mutations are not required before
> selectable lactase function can be realized. It is known that the
> mutation in ebgR arises first that allows the cells to grow very
> slowly on lactulose. The second mutation (in the ebgA gene) then
> arises and allows the double mutants to grow very rapidly.

Yes. Knowing the minimum sequence requirements of a lactase is
completely irrelevant to calculating the organism's ability to evolve
lactase activity. What matters is how many mutations away from
selectable lactase activity some actual gene in the organism is.


>
> http://www.biochemj.org/bj/325/0117/3250117.pdf
> http://www.science.siu.edu/microbiology/micr460/460%20Pages/460.SPAM.html
>
> > 3) You seem to think that knowing the total length of the proteins
> > involved (in this case, about 1200 for the two that act together at the
> > same time) and how many proteins are involved in the system (2) allows
> > you to determine the number of amino acids that are 'fairly specified'
> > and the 'level of complexity'.
>
> As I have said many many times before, I am interested in knowing the
> *minimum* number and specificity of amino acids required to achieve a
> particular type of function.

Well, so am I. Not that knowing that number is relevant to evolution.
But I am interested in how you went about calculating it.

[At the very end of this long post, I find out that the minimum number
is not, in fact, the minimum number. It is an *estimate* of some
unspecified sort based on the smallest *known* sequence with that
particular activity. Why it is an *estimate* rather than the actual
number of amino acids in smallest *known* sequence with that activity is
not addressed. Nor is any mention made of what this smallest known
sequence is. Apparently no effort at all was made to determine whether
or not the smallest *known* sequence has any relationship at all to the
*minimum* requirements for total amino acid length nor what fraction of
even the smallest *known* sequence is 'fairly specified' (which remains
undefined). The possiblility remains that the number was pulled out of
the ether.]

> I dare say that the 1200aa normally used
> in this case are not all needed are and are not all that constrained.
> As explained already, a more likely minimum number of required amino
> acids is probably somewhere around 400 relatively loosely specified
> amino acids.

And, as I have repeatedly asked, both politely and not, "HOW THE F**K
DID YOU ARRIVE AT THAT NUMBER?" It seems to me that all you did was
wave your hands and poof the number out of thin air. You certainly have
been repeatedly asked to justify that number and all you have done is
wave your hands and poof it out of thin air again.

> > Please perform this mathematics on the
> > ebg system for me. If you cannot calculate "the total number of amino
> > acids required for a particular type of function to be realized at its
> > most minimum beneficial level of function" for a simple system like ebg,
> > what makes you think you can do so for a larger or more complex one?
>
> But I can. I suggest to you that the type of function produced by the
> ebg system has a very similar minimum size requirement and positional
> constraint limits as do other lactase genes/systems which seem to have
> a minimum requirement of somewhere over 400 relatively loosely
> specified amino acids.
>
> Now, you can easily prove me wrong here by finding a functional
> lactase enzyme that requires less than 400aa. Do you know of such a
> lactase that actually works to some selectable advantage in any living
> thing?

ebg is NOT a lactase. That is important for you to remember. ebg is
NOT a lactase. Before mutation it does not have selectable lactase
activity. You yourself said this was true. Repeat after me: ebg is
NOT a lactase. ebg is NOT a lactase. But I certainly agree that it is
at least as complex as the lacZ gene product and probably just as
complex as any other protein of the same length without internal repeats
if you actually have a way of measuring complexity.

> > 4) After calculating "the total number of amino acids required for a
> > particular type of function to be realized at its most minimum
> > beneficial level of function" (you claim to be able to do so -- at least
> > I have seen you give estimates of around 480 or so for ebg, but it would
> > certainly be nice to see what went into the calculation)
>
> This calculation is based my own database search and the searches of
> others that suggest that there are no functional lactase enzymes
> smaller than 400aa.

Are all 400 aa completely constrained in the 400 aa sequence? How is
this a valid estimate of the *minimum* number of amino acids needed for
lactase activity rather than just the smallest known number? I think
this is an absurd way of determining the minimal requirements of a
lactase. Absurd both because most lactases are not independently
designed minimal structures, but are evolved from past lactases and also
because it doesn't take any knowledge about how enzymes work into
account. I agree that a certain minimum number of amino acids are
needed to form the 3-D structure needed to bring the relevant amino
acids that *independently* bind the substrate optimally (not too tight,
not too loosely) and the other amino acids that provide the nucleophile
for the hydrolysis into the proper position.

There are only a few amino acids that are actually crucially involved.
A few amino acids that determine substrate specificity -- in this case,
specifically binding a galactose sugar moiety and its glycoside linkage
and another few amino acids that provide the nucleophile that lowers the
energy of activation of hydrolysis of that linkage. Most of the amino
acids in the protein that have any other function are involved in
ensuring that those few are at the right position a sufficient amount of
the time and are flexible enough to bind the substrate and release it
after hydrolysis. That is, most of the amino acids beyond a very few
are involved in oragami. Now, and here is the important part, *any*
amino acid sequence that provides the proper orgami so that the crucial
few amino acids are in the proper position will have lactase activity.
There are many ways to provide that 3-D structure, some using many more
than 400 amino acids, some using 400 acids, there may even be some that
use significantly less than 400 amino acids. But we only know the
lactases which *nature* found, not all possible nor, certainly, *the*
minimal lactase. In fact, nature sticks primarily with what works and
first-come, first-serve. Nature is extraordinarily wasteful in pursuit
of reproductive success, efficiency is not evident in biochemistry, and
the panda's thumb shows that first-come solutions tend to prevent even
better ones. [This is also the reason why HbS/HbA resistance to malaria
took root much more strongly than HbC/HbC resistance.] Nature has
little reason to find *the* minimal-sized lactase. Now, it may well be
that a particular type of structure may be needed to form a cleft of the
proper size and hydrophobicity to be useful for enzymatic activity.
This can be evidenced by certain deletion or insertion mutations. Up to
a point, insertions and deletions in those other parts of a protein
structure that are flexible bends has little selective effect (are
selectively neutral).

> So, it seems like the ~400aa level is the best
> "minimum" requirement that the evidence available to me so far
> supports. If you think otherwise, please do present this evidence.

I think that searching for a minimum *existing* size by looking through
the sequences of *evolved* enzymes is a poor way to find the bare
minimum total size and also will tell you little or nothing about how
many of the amino acids *within* that minimum size are 'fairly
constrained' (which remains a nebulous undefined concept).


>
> >I want you to
> > calculate the odds of ebg evolving into a selectable beta-galactosidase
> > enzyme *based solely on these numbers* and NOT based on any other
> > knowledge. This estimate of the odds of generating functional lactase
> > activity from ebg would be called a 'prediction' of *your* 'hypothesis'.
>
> The odds are extremely good that the wild-type ebg sequence will
> evolve into a selectable beta-galactosidase in short order (one or two
> generations) since it is only a single positional change away from
> success, but that is not the important question.

Yes, it is. Your *claim* is that one can, by looking *only* at the
minimum amino acid size needed for a function, declare *from that
knowledge alone* how easy or impossible it is to generate a particular
function. That is exactly what you do with the bacterial flagella. You
calculate the minimum amino acid size (by bogus methodology or not is
for later analysis) and then *declare* and assert that because the
minmum amino acid number is 6000 or 1000 or 400 or whatever, it is
impossible for a bacterial flagella to evolve from *any* precursor
state. Using that same logic on ebg, which, as you point out, has NO
selectable lactase activity, and which also needs, as you point out, 400
minimum amino acids to have lactase activity, you should NOT be able to
generate any lactase activity from ebg by single mutations if you apply
your mathematical model of evolution to ebg. The reality is, of course,
that you can evolve lactase activity relatively easily. The reason is
that ebg is not some *random* protein or *random* sequence. Yet you
insist that you can still tell how hard it is to evolve a function based
solely on the minimal number needed for that function.

> My idea doesn't look
> at sequences so much as it looks at types of functions. What are the
> odds that a particular organism or group of organisms will have
> anything within their collective genomes that is close enough to
> evolve any type of new beneficial function within a given level of
> specified complexity? That is the important question.

And the answer is highly variable. Remember that ebg is NOT a lactase.
It does, however, have a glycoside binding site that can be converted to
binding galactoside linkages (no change is needed to hydrolyse that
linkage). And I would expect most bacteria to have *some* enzymes that
perform glycoside hydrolysis. Now, a particular bacteria may NOT be
able to evolve a lactase from every one of its glycoside hydrolases
because the current function of that enzyme may be more important than
the gain obtained by having lactase activity. That is why duplication
of an existing enzyme is often the first step in evolution. But I
certainly would expect lactose to evolve from a pre-existing glycoside
hydrolytic enzyme rather than from some *random* protein or *random*
sequence, which is what the math of your model repeatedly proposes
(although you deny it just as often).


>
> Given this question, it is very interesting that the E. coli bacterial
> species seems to have a "spare tire" lactase gene that is just one
> mutation away from success.

Why is it amazing to you that a bacteria has other glycoside hydrolases?
It is interesting that the ebg hydrolase activity is dispensible in
conditions where lactase activity is crucial to survival. That is,
duplication was not a prerequisite here. But that is about all that is
interesting. Ebg is not present in E. coli to be a "spare tire"
lactase. If it were, it would have retained lactase activity. It has a
related function, but that function is not lactase activity.

> This would not be such an interesting
> finding if lactases where less specified than they are. For example
> if the density of lactase sequences in 400aa level of sequence space
> were say as high as 1 in a billion, the average gap between lactases
> would be less than 7 mutations wide.

Please show your math. This is hand-waving numerology. And irrelevant
because no one (but you, and you deny it) is arguing that evolution
works by starting with a *random* 400 aa.

> For a colony of bacteria
> numbering say 10 billion individuals, this gap would be crossed in no
> more than several months by all types of bacteria. What is
> interesting though is the very "limited evolutionary potential" that
> many types of bacteria have when it comes to the evolution of this
> relatively simple enzymatic function. Without their spare tire gene,
> E. coli cannot evolve this lactase function despite very positive
> selection pressure, artificially elevated mutation rates, and tens of
> thousands of generations of time. Many other types of bacteria have
> not been able to evolve this relatively simple lactase function
> despite well over a million generations of documented observation.
>
> So, what does this mean? It means that the density of sequences with
> the lactase function is actually quite low. This low density is what
> limits the evolutionary potential of many organisms that would
> otherwise benefit from a lactase enzyme if they were able to evolve
> one. The fact that they do not evolve one means that the gap between
> what they have and the nearest lactase enzyme is simply more than a
> dozen fairly specified mutations away.
>
> > 5) Now let's take a different protein or protein system, also 1200 total
> > amino acids in length. We will make it a bit less complex, by making it
> > a single unregulated protein.
>
> The fact that a protein operates as a single unit does not make it
> less complex than a multiprotein function that requires the same
> minimum amino acid number and level of specificity.

Yet it seems that only really large multiprotein complex systems seem to
reach the state of unevolvability. Really large single proteins never
do.

> Also, all protein
> functions are regulated in one form or another.
>
> > But this protein is in the histidine
> > pathway. That is, it is a random protein wrt lactase function, chosen
> > merely because of total amino acids present. Let's say that for *its
> > function* the very same "total number of amino acids required for a
> > particular type of function to be realized at its most minimum
> > beneficial level of function" exists. I want you to calculate the odds
> > of this protein evolving into a selectable beta-galactosidase enzyme
> > *based solely on the numbers you thik are important* and NOT based on
> > any other knowledge about this protein.
>
> Starting with a random sequence of 1,200aa acting in some beneficial
> manner, you are asking how long it would take to evolve a
> beta-galactosidase? Is that what you are asking?

That is what *you* repeatedly assume, in your mathematical analysis.
But I did not say it was a random sequence. I said it was a protein
which had its own function, but that function was completely unrelated
to glycoside hydrolysis. That is, unlike ebg, it has no glycoside
binding site and no active site for the hydrolysis of glycoside
linkages. That is what your entire calculation is based on. Starting
with either a random protein or a random sequence of 400 aa and evolving
lactase activity from that starting point.

> If so, then say the
> density of lactases in sequence space of 400aa minimum was low enough
> to require 24 specified mutations, on average, to go from one lactase
> island to another. If true, then, on average, a sequence of 400aa in
> a given gene pool would be around 12 specified mutations away from the
> closest lactase sequence creating a gap of 4,000 trillion non-lactase
> sequences. Say the colony size is 1 trillion individuals living in a
> steady state and the mutation rate is one mutation per 400aa per year
> per individual lineage (an pretty high mutation rate). Well, starting
> with 1,200aa in a colony of 1 trillion would give us 3 trillion
> sequences of 400aa each evolving at the same time (given that this
> 1,200aa sequence was released from selective constraints perhaps via
> gene duplication). This means that each year 3 trillion sequences out
> of 4,000 trillion will be searched out. At this rate, on average,
> success will be realized in just over 1,300 years on average (defined
> as the evolution of a beneficial lactase function in one member of the
> population).

IOW, significantly more than the five years you allow in
experimentation. And significantly more than were required if the
protein were a non-random protein like ebg. Despite both ebg and this
protein having a 400 aa minimum to have lactase activity.



> > 6) Same thing, except now we have a completely random sequence of 1200
> > total amino acids.
> >
> > I will accept failure to evolve a selectable beta-galactosidase activity
> > in five years as evidence that your math is correct for *that type of
> > protein* (even though I really should wait a gazillion years, just to be sure).
>
> Ok, what is your counter argument?

My point is that neither of the last two are proposed evolutionary
models. The first one, where evolution procedes by modification of a
pre-existing enzyme with related function, is doable in real time. The
first model produces lactase with one or two mutational steps. Your
models, which assume one or the other of the last two cases, would take
many more than five years. The last two represent the straw man
evolution your math assumes. Reality does not.

> If the density of lactases in
> sequence space of 400aa was very much less than what I based my above
> calculations on, then why were Hall's E. coli so limited in their
> ability to evolve a type of function with such a high density of
> sequences in sequence space?

E. coli has a *specific* set of completely *non-random* sequences in
sequence space. What matters, and the only thing that matters, is that
it had at least one sequence which *was* able to *rapidly* evolve
lactase activity. Part of this may be a consequence of the fortuitous
fact that ebg' original function is a dispensible function when
selection is strong for lactase activity. Otherwise, one would probably
need the fortuitous duplication of another glycoside hydrolase and its
subsequent modification.

> > That is because your bogus math does not take the specific ancestral
> > sequence and its pre-existing functionality into consideration.
>
> Actually it does. No matter what you start with you cannot get around
> the fact that on average your starting points will be a certain
> distance from new sequences with new types of functions. This
> distance gets exponentially larger, no matter what your starting
> sequences are, at higher and higher levels of specified complexity.

And how does one measure levels of specified complexity? Is not ebg
just as (or more) complex as the putative histone synthetic enzyme of
the same length in the discussion above? Yet the putative histone
synthetic enzyme was many more changes away from lactase activity than
ebg. It was about as far away as a random sequence would be. How much
specified complexity does a random sequence of that length have? How do
you calculate levels of specified complexity and what does it have to do
with evolutionary mechanisms?

> For example, lets just say, by a sheer extraordinary stroke of luck
> that an ancestral sequence in a bacterial colony just happened to be
> one or two mutations away from a new type of function as specified and
> complex as a flagellar motility system. Well, of course this highly
> complex system would evolve in short order now wouldn't it? Ok, but
> how many more such systems would it be able to evolve on average?

First, it wouldn't be sheer extraordinary luck for there to be several
to many functioning glycoside hydrolases in a bacteria. It would be a
matter of luck (but not sheer extraordinary luck) for one of these to be
only a few mutations away from being modified to a somewhat different
function (say improved ability to bind a formerly weakly bound secondary
substrate). Evolution is not teleologic in nature. Systems evolve by
tinkering with what exists, not by direct synthesis of a system from
design or from scratch (some random protein or random sequence). If
nothing exists for the tinkering to start with, and no intermediate
stages of functionality exist, it won't happen.

> How
> long would it take that colony to evolve another type of function at
> that same level of complexity or higher given what it now has to
> proceed with?

ebg has no lactase activity. It, therefore, evolved another type of
function at the same level of complexity (400 aa, according to you) as
lacZ. This occurred by a single mutation to generate selectable
function despite the fact that there is only 25% or so sequence identity
between the two proteins (and that is not much larger than chance
alone), and this initial selectable level of function was improvable and
improved on by a couple of subsequent mutations. Much of evolution
involved changes of function in families of genes which all have related
functionality just like this. This is evidenced by sequence identity
and structural and functional similarity of the existing genes. The
evolution of lactase activity in ebg is of this type. But so is the
evolution of the globin genes of hemoglobin and their relationship to
myoglobin. And the evolution of flagellar motility from a previous
TTSS-like non-motile function also involved only a few changes in
sequence. Most of the 'functional' relationships (which protein binds
to which other protein) were unchanged in going from a TTSS to a
flagellar function. The proteins that compose the bacterial flagella
are not a random sample of proteins; they are a non-random sample of
proteins that are very similar to the proteins that form a TTSS system.
Other evolution involves chimera formation to generate new function.
Examples of this include the antifreeze gene of certain Arctic fish, and
possibly the changes in allosteric effectors of transmembrane signalling functions.

> Odds are that everything it has will be gazillions of
> years away from any other type of function within such a level of
> complexity or higher.

Compare the level of complexity within a TTSS system without and one
with motility of the 'whip/protein export tubule'. Those two systems
are not separated by a very large change in 'level of complexity' nor,
necessarily, by hundreds of amino acid changes. And it is the
*difference* between selectable steps that is important. No one is
claiming that flagella arose *as flagella* by evolving it directly from
20 random proteins. Flagella arose by a modification of a pre-existing
TTSS-like system. And that system itself did not contain random
proteins. It contained, say, 19 proteins that were as similar to 19
flagella proteins as ebg was to lacZ. Again, the *only* way your
calculation makes any sense at all is if you are assuming that the
starting point is some *random* or *average* protein or *random* or
*average* sequence and that the only possible function is the teleologic
one you declare.

> In fact, the odds are so great against
> evolution at such levels that the witnessing of evolution at such a
> level should cause one to seriously look into the almost certain
> finding of a pre-exiting system that had been lost for a time but
> who's code was still there pretty much intact.

Or a system that has a different current 'function', but which *can* be
modified, by tinkering around the edges, into a system with an emergent
function. Such as modifying a TTSS-like or other whip-like system to a
motility function (as has happened independently at least twice, in
eubacteria and archaebacteria). Or, if that pre-existing system is
unavailable, modifying a secretory system to a 'new' motility function.
Or if that pre-existing system is unavailable, modify an internal
transport mechanism involving microtubules into undulopodia. Or if that
pre-existing system is unavailable, modify a different internal system
of transport (actin/myosin-like) to generate amoeba-like pseudopodia.
Motility, and the need or utility of motility, is occasionaly a strong
selective force, and evolution tinkers with what exists to try to
generate a motility system. It doesn't start with random or average sequences.


>
> For example, cavefish who have lost their eyes still have the code to
> make eyes in their genome. It has been shown that a single point
> mutation can restore the production of fully formed eyes in the
> offspring of these fish. Does this mean that eye evolution has been
> demonstrated? Absolutely not.

It is a demonstration that one does not need to invent the wheel from
scratch and start with a random sequence.

> All this shows is that the evolution
> of such a highly complex system requires a pre-existing code for this
> system that has been shut down by a slight change to the system.

Yes, for the particular case you mention. But the existence of 'eyes'
of various intermediacy show that intermediate states to eye formation
can certainly have independent functional utility that are not thousands
of changes from one another. Indeed, even the chemistry of vision shows
two independent re-utilizations of a retained primitive step (the first
steps using cis-retinol) and then co-option of other, independent
pre-existing steps to generate the nerve signal. And the independent
evolution of eyes in squid and humans use these quite distinct biochemistries.

> Without this historical existence of sightedness in the ancestors of
> these fish, they would never have been able to evolve the ability to
> see.

For fish, certainly. But that is not the case for squid and fish.

> The same is true for flagellar motility. I know that you have
> suggested that the ability to mutate a flagellar system so that it no
> longer works as a motility system, just keeping its TTSS system
> intact, and then mutating back the motility function is an example of
> high complexity evolution in action.

No. I pointed it out as a trivial example, not one directly identical to
the steps involved in the evolution of the system. But just as there are
organisms with intermediate structures for vision that have independent
utility, the TTSS systems show that one does not need to have the full
dual functionalities of TTSS-like activity *and* motility in order to
have selectable activity. That is, the TTSS system can be selected for
its utility independent of the motility function. The same cannot be
said for the eye with no vision in cave fish -- it has gone from being a
structure selected for utility in vision to a vestige that has no
purpose wrt vision or anything else. It has no independent utility as a
vestige. But the pinhole eye of Nautilus does have selectable purpose.
The pit eyes of many invertebrates do have independent utility.
Cis-retinol/opsin can have selectable function even if disconnected to
the nervous system. But it certainly can have selectable function if it
interacts biochemically with the nervous system sending a signal about
the level of light received. And it does. Differently in vertebrates
and invertebrates, and not by making an entire new system from scratch,
but by interacting with a pre-existing pathway of nerve signalling
biochemistry.

> It really is nothing of the
> sort. It is on the same level as blind cavefish evolving their eyes
> back again.

No. The difference is that the TTSS system has an independent
selectable utility. The lost eye/vision in blind cavefish does not, but
the precursor to the vertebrate camera eye did -- it was useful for
vision, but not *as* useful as the camera eye.

> Without the pre-established code already being there and
> working at that type and level of functional complexity in the
> ancestors of that organism, such levels of complex function would not
> evolve in trillions upon trillions of years.

Have you read the Nilsson and Pegler paper?



> > The interesting thing, of course, is that this experiment (selection for
> > the evolution of galactosidase activity) actually has been run and your
> > model of calculating the odds has been tested. That is, the prediction
> > based on your methods of determining the odds of evolving a new function
> > has been subject to test. Did it pass the test for *all* of the
> > examples, or only for the examples where you start with a random protein
> > or a random sequence?
>
> You must understand that we are talking averages here. What is the
> average time required to evolve a new function at a particular level
> of complexity?

Evolution does not work via *averages* or *random* sequences. It works
by modifying pre-existing systems and structures and does so in a
step-wise fashion with frequent positions of indepedent selective
utility. One can no more calculate the time required to evolve a new
function from knowing only its particular level of complexity than one
can calculate the time it takes a car to reach Indianapolis by knowing
only the complexity of the make and model of the car. [Knowing how far
the car is from Indianapolis is much more useful in e