Determining the Functional Specificity Requirement

Seanpit

unread,

Jul 23, 2006, 12:19:40 PM7/23/06

to

> Von R. Smith wrote:

< snip >

> "...A function that has greater minimum size requirements (given a
> constant degree of specificity) with[sic; I assume you meant "will"] be
> at a higher level of functional complexity. The same is true for a
> function that requires a greater degree of specificity (given a
> constant minimum size requirement). And, obviously, the same is true
> for functions that require both a greater minimum size and specificity
> requirement."
>
> Some important points to remember are:
>
> 1) The specificity is required for "the function in question", which
> in this case is the 2,4-DNT degredation pathway. It is *not* just for
> *any* function at all.

Right . . . But, since this function is a cascading function, it does
not require specific 3D orientation. Therefore, its degree of
specificity will be exponentially reduced relative to a function that
uses the same amount of genetic real estate, but does require specific
3D orientation of its parts. In comparison then, the relative
specificity of a cascading system will be pretty much equivalent to its
most complex single protein part.

> 2) Minimum size requirement figures into the measure of functional
> complexity, independently of "sequence specificity".

Minimum size, as a measure of complexity, is not independent of
sequence specificity. Both of these minimum requirements must be taken
together in order to calculate functional complexity. They have no
independent utility in this regard.

> 3) Sean *measures* functional complexity by the metrics of specificity
> and minimum size requirement, or at least he says he does. If his
> conclusions about the "functional complexity" of an enzyme cascade are
> to have any rigor, they must be honestly and accurately based on this
> metric, not tailored with special rules to fit his desired conclusion.

The minimum size and specificity requirements show that cascading
functions have exponentially less required specificity than does a
function of equivalent size that also requires specific 3D orientation
of all its parts relative to all the other parts.

> 4) Disclaimer: Before we get much deeper into this, I want people to
> understand that I don't actually endorse Sean's methodology or even
> consider it particularly relevant to how biochemistry actually works.
> I am, however, humoring his terminology and method of argument for the
> purposes of the current discussion, to show that his claim that:
>
> "Even with an entire gene pool
> of many thousands of functional
> sequences evolution will not create a
> novel functional system *of any kind* (not
> just a particular teleologic system)
> that requires a minimum of more
> than 3 or 4 fairly thousand fairly
> specified bp of genetic real
> estate."
>
> is false, even by his own definitions of his home-grown terminology.

You don't understand that your cascading function is completely out of
the league of non-cascading functions with regard to degree of minimum
required specificity. The math is obvious, yet you still don't get it.

> > > This is a non-standard usage of the term "specificity", but we'll let
> > > that pass. Now, it is trivial that it takes a longer sequence to code
> > > for the entire 2,4-DNT pathway than it does for any individual step in
> > > that pathway. So unless you think poor dntAbAcAd can do the entire
> > > pathway all by itself, then the pathway must have a greater "minimum
> > > size requirement" than the largest individual gene within it. And
> > > unless you are saying that any random sequence can fill in the rest of
> > > the "size requirement" in order to perform the rest of the steps of the
> > > pathway than there must be at least some "limitations of character
> > > differences" wiithin the rest of the coding sequence.
> >
> > An enzymatic cascade has relatively low specificity because of several
> > reasons. Each step in the cascade may be beneficial all by itself,
> > without the subsequent cascade.
>
> Irrelevant. By your own stated rule, specificity is to the "function
> in question", not just any function at all.

Yes, that's what I'm talking about - use of 2,4-DNT to some advantage.
That is the function in question here.

> And yes, of course the
> individual components and subassemblies can be beneficial by
> themselves; that would be expected if the cascade had evolved in the
> incremental, selectable fashion envisioned by the *actual* ToE you
> profess to understand and argue against, rather than the strawman
> version of it you keep telling us you repudiate. "Irreducible
> complexity" is your bugbear, not mine.

Flagellar motility cannot be realized at all, not even a little bit,
until over 20 proteins, which are each fair specified, are each
oriented in a highly specified degree with each other in 3 dimensions.
This is not true of your enzymatic cascade - which does not require 3D
orientation and which 2,4-DNT may be useful well before the entire
cascade is complete.

> Practically any biological system you might name is going to have this
> sort of modular structure, including your favorite examples at both
> ends of your "ladder of complexity: the flagellum (the export system
> alone is useful) and cytochrome-c (just the part binding heme is
> useful). This isn't some sort of special property unique to cascades,
> so why pretend that it is?

Yes, there may be subsystems of functions to subparts of a flagellar
system of motility, but the motility function will not be realized
until all the parts are in place. On the other hand, 2,4-DNT utility
may be realized will before the entire cascading pathway is in place.
This only part of the reason why cascading systems are not in the same
league with systems like flagellar motility.

> Your ploy here sounds suspiciously like an attempt to define away any
> possible evolution of complex structures: if a function requiring
> several proteins can be built up through selectable, individually
> useful intermediates, then it doesn't "count" as evolution of a complex
> structure.

Not at all. I'm saying that if a particular type of function can be
realized early on, like 2,4-DNT utility, before the entire cascade is
complete, then it isn't an example of a function that requires all the
parts before that function, in particular (not some other type of
function) can be realized.

> Under these rules the *only* examples that would count
> would be ones that evolved via your "neutral-drift" strawman version of
> evolution, a strawman you disown whenever challenged on it, but then
> try to slip right back in at the next opportunity.

You don't understand the difference between cascading functions and
those functions, like flagellar motility, that require specific 3D
orientation of all the parts at the same time before the function in
question can be realized even a little bit.

> > Enzymatic activity by itself doesn't
> > require a great deal of size or specificity in order to achieve minimum
> > usefulness (as I've previously showed you with a 1200aa lactase that
> > could work with just 400aa). 3D orientation of the individual enzymes
> > is not required.
> >
> > But, for argument's sake, lets say that all the enzymes in a particular
> > cascade are required before a benefit will be gained.
>
> No, let's not say that for argument's sake, because that isn't the
> point under argument. Nobody is saying that the entire cascade must be
> in place for *a* benefit to be gained.

We are talking about 2,4-DNT utility here. The benefit will be the
ability to use 2,4-DNT to some advantage. That is the function in
question here. If this function can be realized without the entire
cascade in place, then it cannot be used as a function that requires
the entire cascade to be in place. Flagellar motility, on the other
hand, does require the entire collection of protein parts to be in
place. See the difference?

> The issue is what is required
> to degrade 2,4-DNT sufficiently for an organism to use it as its sole
> carbon, nitrogen, and energy source (the "function in question").
> Let's keep that in mind.

Are you sure that 2,4-DNT must have the entire cascade in place before
2,4-DNT can be used as the sole carbon source? That was my original
question. However, even if it is true that the entire cascade must be
in place before this function can be realized, it doesn't help you to
any significant degree. Why? Because of the fact that cascading
enzymatic systems to not require specific 3D orientation of all the
subparts of the cascade. The lack of this specificity requirement
dramatically reduces the overall specificity of the system in
comparison to a system of the same size that does have this requirement
(there is a huge exponential difference).

> > The fact that 3D
> > orientation is not required dramatically reduces the specificity.
>
> How? AFAICT, "specificity" sensu Pitman basically means the degree of
> sequence constraint required to conserve function. I am not sure what
> you mean by "3D orientation", but my best guess is that it is something
> to do with the degree to which the proteins combine to form an actual
> structure or spatial arrangement.

Yes, the requirement for a specific 3D orientation of the subparts,
before the function in question can be realized, exponentially
increases the functional complexity of that system via an exponential
increase in sequence specificity.

You have the seemingly obvious notion that if a system of function
requires two proteins, A and B, coded for by two genes X and Y, that my
requirements are nothing more than a simple addition of the base pairs
in both genes. This simply isn't true. You have to take into account
the requirements of the system. Does the system in question require
specific orientation of the parts relative to each other? If so, the
degree of specificity increases exponentially - not additively.

> If that is an accurate statement of your notion, then I would maintain
> that you are probably wrong to assume a simple, straightforward
> relationship between complexity of quaternary structure and
> conservation of primary structure. AIUI, some structurally simple
> peptides can have (and possibly require) high sequence conservation,
> such as histone, whereas other structures requiring a larger-scale and
> more complex "3d orientation" might have relatively low constraints on
> sequence, such as hemoglobin.
>
> So before I entertain this particular claim, I would like to see an
> actual justification for it. Maybe a chart plotting the relation of
> sequence conservation to some index of "3D orientation", with some
> actual values derived from published data.

Take the flagellar system, for example. The individual protein parts
within this system each require a fair degree of sequence specificity
as well as size in order to work properly within the larger system of
flagellar motility. On top of this, they also required to be in a very
specific orientation will the other parts of the system. This is
unlike your cascading enzymatic system where the individual parts of
the cascade are not required to be specifically oriented with any of
the other parts of the system in 3D space in order for the overall
system to work properly.

> > For example,
>
> *interrupts*. The below is not an example of anything we are
> discussing. This is another of your weak language analogies. If
> you're going to present something and call it an "example", at least
> give a relevant one from biology.

Oh please! Are you really this limited in your ability to comprehend
that what is described below is identical to what is happening with a
biological system? All you have to do is change the number of
potential characters per loci from 26 to 4. The formulas do not change
and neither does the significance of the result.

> > what are the odds that two specific 3-character sequences
> > (26-possible characters per position) will be present in a long string
> > of say, 1 billion randomly typed 3-character sequences (3 billion
> > characters total)? The total number of possibilities is 17,576
> > (sequence space for 3-character sequences). The odds that a particular
> > 3-character sequence will not be anywhere in this sequence is
> > (17,575/17,576)^1e9 = 6e-24,711. So, the odds that 1 specific
> > 3-character sequence will be present in our sequence is very good at
> > 1-6e-24,711 = very very near 100%. So, what are the odds that 2
> > specific 3-character sequences will be there? Also very close to 100%.
> > (i.e., 0.99999 . . . * 0.999999 . . . ).
> >
> > Now, what about a 6-character sequence. What are the odds that a
> > specific 6-character sequence will be present in these 3 billion
> > characters? The total number of possibilities is 308,915,776. The
> > odds that a particular 6-character sequence will not be anywhere in the
> > string of 3 billion characters is 308,915,775/308,915,776 ^ 5e8 =
> > ~0.20. So, the odds that 1 specific 6-character sequence will be
> > present in our sequence isn't nearly as good at 1 - 0.20 = 0.80 or 80%.
> > So, what are the odds that 2 specific 6-character sequences will be
> > found within our 3 billion character string? Only 64% chance.
> >
> > What about a 12-character sequence? Sequence space = ~95e15
> > possibilities. The odds that a particular 12-character sequence will
> > not be anywhere in the string of 3 billion characters is 95e15-1/95e15
> > ^ 2.5e8 = ~ 0.999999997. So, the odds that 1 specific 12-character
> > sequence will be present in our sequence is much much less likely at 1
> > - 0.999999997 = ~1e-9. The odds that 2 specific 12-character sequences
> > will be present is about 1e-18.
> >
> > This is why specific shorter sequences that need only be specified
> > independent of the other sequences in the cascade, do not carry nearly
> > the degree of specificity that a function carries which requires all
> > its parts to be specifically oriented relative to each other into a
> > unified whole. A function that requires two highly specified
> > 3-character sequences that are not also required to be in a specific
> > orientation/relationship with each other does not carry nearly the
> > degree of specificity of a function that requires a 6-character
> > sequence where all the characters must be oriented relative to all the
> > others. The degree of specificity is exponentially greater for the
> > 6-character function vs. the function that requires two independently
> > acting 3-character parts.
>
> Once more, let us recall that this is the claim I am presenting the
> 2,4-DNT cascade to dispute:
>
> "No function that requires a minimum of more than 3 to 4 fairly
> specified Kb of genetic real estate ever evolves in reality - there is
> not one example."
>
> I am not asking you about the *degree* of specificity compared to some
> other sequence you believe to be even more complex.

It seems that you are trying to argue is that all systems that require
3 to 4 thousand base pairs of genetic real estate are all "fairly
specified". They aren't. Given the same size requirement and degree
of specificity of the individual subparts of two different systems, one
a cascading system and the other a non-cascading system that require
specific 3D orientation of each part with all the others, the cascading
system will not have near the overall specificity of the other system -
not even remotely close.

> I am simply
> asking for a count of the number of "fairly specified" base pairs
> required to code for "the function in question".

And I've told you the answer over and over again. The equivalent
degree of fairly specified base pairs in a cascading system, as
compared to a system that requires specific 3D orientation of all of
its subparts, is the size of its most complex subpart.

> That number, if it is
> to be meaningful, shouldn't change just
> because you can imagine a more
> complex system than the one we are examining.

You are trying to compare apples and oranges here. System specificity
is dramatically affected by the requirement for 3D orientation of all
of the subparts. That is part of the measure of system specificity.
Cascading systems of function have very very low system specificity -
relatively speaking. They simply are not "fairly specified" in
comparison to a system of equivalent size that does require specific 3D
orientation.

> To go back to your
> "example", even if we grant your argument that your 6-character
> function is more specified than your pair-of-3-character function, the
> fact remains that both require a total of six specified characters.

That's true . . . However, the minimum size requirement, by itself, is
not a measure of functional complexity. Measuring functional
complexity requires that the calculation include both variables at the
same time - both size and specificity.

> This point is especially pertinent when we remember that we are talking
> about proteins, in which the most highly conserved and functionally
> crucial residues are usually spaced apart singly or in very small
> groups, rather than together in large clumps.

It doesn't matter as long as the large clumps require specific
orientation with all the other large clumps. This 3D orientation
requirement dramatically increases the overall specificity of the
protein as well as of a system of proteins.

What it boils down to is the ratio of sequences in sequence space that
could give rise to the system in question vs. the ones that could not.
When you do not have a specific 3D orientation requirement, the size of
the sequence space that must be considered is dramatically reduced.
This ends up dramatically increasing the ratio or odds of finding the
needed parts in a given amount of time.

> > This is why cascading enzymatic systems are not comparable, as far as
> > functional complexity is concerned, to higher level systems like
> > flagellar motility.
>
> I am not asking for a comparison, I am asking for a count. If being
> "fairly specified" is actually a property of the sequence in question,
> I should be able to determine it without such comparisons, simply by
> estimating the minimum size and degree of sequence constraint of the
> primary structure.

Your mistaken in estimating specificity is in thinking that parts that
do not require specific orientation will all the other parts in a
system can simply be added up to produce a total. This isn't true.
You can only add up the numbers like you do if the parts do have this
additional specificity requirement. What you are actually doing in
this "addition" is not adding at all. You are actually multiplying.

For example, what is 10^20 * 10^20? Isn't the answer 10^40? In other
words, you are multiplying the numbers, which is done by addition of
the exponents. What happens with cascading systems is that you can't
multiply the individual numbers. You can only add them together: 10^20
+ 10^20 = 20^20. That is a far different number - right? As far as a
measure of specificity is concerned, this number is not nearly as
"specified" as the 10^40 number. In other words, there is a far greater
ratio of potentially workable sequences on the one hand than on the
other.

> > > So now that we have established that the answer to my question cannot
> > > be "the minimum coding sequence for the largest protein in the
> > > pathway", let me ask again: How many of the base pairs in the genetic
> > > sequences encoding the 2,4-DNT pathway are "fairly specified"?
> > >
> > > Let me give you some help with the numbers. You have yourself
> > > identified 4 enzymes that are central to and specific to the cascade in
> > > question. Here are the smallest sequences performing their functions
> > > that I could find in the NCBI database:
> > >
> > >
> > > dntAb ~ 104
> > > dntAc ~ 447
> > > dntAd ~ 194
> > >
> > > dntB ~ 548
> > >
> > > dntD ~ 314
> > >
> > > dntG ~ 281
> > >
> > > for a total of 1,888 amino acids.
> >
> > As explained above, one simply cannot add up the numbers like one could
> > for a non-cascading system were each part required specific 3D
> > orientation against all the other parts of the system.
>
> You misspelled "as asserted above".

Read it again and reconsider the notion that cascading systems are
remotely "fairly specified". They just aren't.

> Let us recall what you said about how to measure specificity:
>
> "As I've told you over and over again, specificity is a description of
> the minimum sequence order or limitations of character differences
> within a sequence that can be sustained without a complete loss of the
> function in question. A function that has greater minimum size
> requirements (given a constant degree of specificity) will be at a
> higher level of functional complexity. The same is true for a function
> that requires a greater degree of specificity (given a constant minimum
> size requirement). And, obviously, the same is true for functions that
> require both a greater minimum size and specificity requirement."
>
> So what I want from you is: *first* tell me how to determine the
> minimum size requirement for coding the *entire* 2,4-DNT pathway, since
> that is the "function in question",

Given that the function in question, use of 2,4-DNT as the sole carbon
source, cannot be realized without all the enzymes in the pathway, I'm
fine with a total of about 2,000aa or 6,000bp.

> *then* you tell me how to measure
> the degree of sequence conservation required,

I already did - about. Plug in the numbers and use the same formula to
get your answer. The degree of specificity of a 2,000aa cascading
system, relative to the degree of specificity of a 2,000aa function
that requires all of its parts to be specifically oriented relative to
each other, will be exponentially less so that the degree of complexity
is basically a measure of its most complex subpart - like I pointed out
originally.

> *then* we will have an
> answer as to how many "fairly specified"
> base pairs there are in the coding sequence.

Great! ; )

> I don't want your assertions about the complexity of enzyme cascades.

The complexity of enzymatic cascades is tied to specificity - which
just isn't remotely close to the level of being "fairly specified".

> I want a method of measurement that independently verifies your
> assertions, you know, one that *doesn't* use made-up ad hoc rules meant
> to save an already-decided-upon conclusion.

These aren't ad hoc rules since they haven't changed. The rules have
always been "minimum sequence size and specificity". Cascading systems
have the size, but not the specificity.

> You have stated rules for measuring complexity, Sean. I want to see
> you apply them consistently, rigorously, and honestly. First, I want
> you to *measure* the compelxity of the 2,4-DNT pathway, using your
> stated parameters of minimum size requirement and degree of primary
> structure constraint; *then* we can discuss whether your assertion
> about it being no more complex than the largest enzyme in it is
> correct.

The equivalent size and specificity requirements for a cascading system
are very close to the measure of the subpart with the greatest size and
specificity requirements. If you want to get really technical, the
total requirement is an addition of the sequence space ratio for all
the parts together.

For example, if each part in a two-part cascading system had a sequence
space ratio of 1e-40, then the total ratio for the entire system would
be 1e-40 + 1e-40 = 2e-40.

However, if each part of a two-part system which required specific
orientation of the parts relative to each other had a sequence space
ratio of 1e-40, then the total ratio for the entire system would be
1e-40 * 1e-40 = 1e-80.

See the difference?

As another example, consider that in 1992 Yockey published a ratio of 1
in 1e36 as his estimate of how many CC there are in sequence space
relative to non-lactase sequences.

In this light, it is also interesting to consider the work of "Robert
T. Sauer and his M.I.T. team of biologists who undertook the
scientific research of substituting the 20 different types amino acids
in two different proteins. After each substitution, the protein
sequence was reinserted into bacteria to be tested for function. They
discovered that in some locations of the protein's amino acid chains,
up to 15 different amino acids may be substituted while at other
locations their was a tolerance of only a few, and yet other locations
could not tolerate even one substitution of any other amino acid. One
of the proteins they chose was the 92-residue lambda repressor.Sauer
calculated that: "... there should be about 10^57 different allowed
sequences for the entire 92 residue domain. ... the calculation does
indicate in a qualitative way the tremendous degeneracy in the
information that does specifies a particular protein fold.
Nevertheless, the estimated number of sequences capable of adopting the
lambda repressor fold is still an exceedingly small fraction, about 1
in 10^63, of the total possible 92 residue sequences."

Now, if you have a system that requires more than 1 protein, how do you
calculate the ratio for the entire system - the multiprotein system as
a whole? Well, for systems that do not require specific orientation of
the subparts with each other, you add up the ratios. But, for systems
that require specific orientation of the subparts, you multiply the
ratios.

References:

Yockey, H.P., On the information content of cytochrome C, Journal of
Theoretical Biology , 67 (1977), p. 345-376.

http://www.detectingdesign.com/PDF%20Files/Cytochrome%20C%20Variability.doc

Yockey, HP, "Information Theory and Molecular Biology", Cambridge
University Press, 1992

R.T. Sauer, James U Bowie, John F.R. Olson, and Wendall A. Lim, 1989,
'Proceedings of the National Academy of Science's USA 86, 2152-2156.
and 1990, March 16, Science, 247; and, Olson and R.T. Sauer, 'Proteins:
Structure, Function and Genetics', 7:306 - 316, 1990.

> It seems to me that, by your own stated criteria for measuring
> "specification" (and for that matter, "functional complexity"), I *can*
> measure it the way I have described because:
>
> 1) all the enzymes named are required for the function in question (so
> they all figure into the "minimum size requirement" for the cascade)

Even if they are all required, you cannot multiply them together like
you did. You must add their individual degrees of complexity together.
Your mistake is that you tried to multiply them together. You just
can't do that for a cascading system.

> 2) each of the enzymes has some degree of sequence constraint; their
> functions couldn't be effectively performed by just any random peptides
> of the same length (so they all figure into the specificity/sequence
> constraint requirement, as well).

That's right. So, to find out the total constraint for the entire
cascading system, you add up the degrees of individual constraints. In
other words, you add up the ratios for each protein part in a cascading
system. However, you multiply the ratios for a system that requires
specific orientation of each part relative to all the other parts.

> You yourself were quick to make the point that each of these enzymes
> has an individual function that could benefit an organism that produced
> it. Presumably, each of these individual functions has a non-zero
> minimum size requirement, and a non-zero degree of sequence constraint
> (you can't perform it with just any random peptide of the appropriate
> length). This undermines your "cascade rule", rather than supporting
> it. If each of these sub-functions has a minimum size requirement and
> non-zero specificity, and each is required for the pathway, then at
> very least the total minimum size requirement and specificity of the
> pathway is the sum of those sub-functions.

That's right - but that's not what you did. I know it must not have
seemed like it to you, but what you did was to multiply the
specificities of each of the subparts. You didn't add them - you
really multiplied. Yet, you can only multiply the ratios if the
function in question requires specific orientation of each subpart.

> > Basically, what
> > you have here is a system of complexity that isn't much more complex
> > than its most complex subpart (i.e., 548 fairly specified aa).
>
> Sure it is, because it has a larger minimum size requirement. Here,
> let me help you remember your own rule:
>
> "...A function that has greater minimum size requirements (given a
> constant degree of specificity) with be at a higher level of functional
> complexity."
>
> So unless you are saying that there is some offsetting *decrease* in
> the specificity of the largest enzyme in the cascade, than the entire
> cascade is more complex by your own stated rules.

Yes, the entire cascade is more complex by my own rules, but not to any
significant degree. The relative degree of increase isn't hardly worth
mentioning. It's the difference between going from 10^20 to 20^20 vs.
10^40.

< snip repetitions >

> > 548aa = ~1644bp of genetic real estate (given that this enzyme cannot
> > be significantly reduced in size and is already fairly specified in and
> > of itself).
>
> Well, that's a number all right. Unfortunately, it only comes from
> your special "cascade rule", and not from any consistent application of
> your criteria of minimum size requirement or sequence constraint.

You just don't understand that you are really multiplying when you
should be adding.

Sean Pitman
www.DetectingDesign.com

Von R. Smith

unread,

Jul 23, 2006, 6:15:08 PM7/23/06

to

Seanpit wrote:
> > Von R. Smith wrote:

While I dislike posting multiple copies of the same post, I felt I
ought to do it here since Sean has begun a whole new thread with this
one.

< snip >

> >
> > I am not asking for a comparison, I am asking for a count. If being
> > "fairly specified" is actually a property of the sequence in question,
> > I should be able to determine it without such comparisons, simply by
> > estimating the minimum size and degree of sequence constraint of the
> > primary structure.
>
> Your mistaken in estimating specificity is in thinking that parts that
> do not require specific orientation will all the other parts in a
> system can simply be added up to produce a total. This isn't true.
> You can only add up the numbers like you do if the parts do have this
> additional specificity requirement. What you are actually doing in
> this "addition" is not adding at all. You are actually multiplying.
>
> For example, what is 10^20 * 10^20? Isn't the answer 10^40? In other
> words, you are multiplying the numbers, which is done by addition of
> the exponents. What happens with cascading systems is that you can't
> multiply the individual numbers. You can only add them together: 10^20
> + 10^20 = 20^20.

ITYM 1e20 + 1e20 = 2e20. Big difference.

> That is a far different number - right? As far as a
> measure of specificity is concerned, this number is not nearly as
> "specified" as the 10^40 number. In other words, there is a far greater
> ratio of potentially workable sequences on the one hand than on the
> other.

So, this is it? The basis of your cascade rule is the sort of math
mistake a C-student might make in his first week of an intro prob/stat
course?

I have seen Sean say some incompetent things about probabilities
before, but this one takes the cake. This isn't even wrong; it is
Zoemath written at a higher reading-level.

The probability of two independent events occuring is the product of
the probabilities of each one occuring. Period. It doesn't matter
whether you're talking about enzymes in a cascade, structural proteins
in a flagellum, letters of an alphabet, or lottery numbers. If you
want to argue that the order in which the proteins are coded are
unimportant for a cascade (which I find a bit questionable), than you
can multiply that product by N!, with N being the number of steps being
coded for. That would be a pretty big swing for something requiring 20
steps (20! is approximately 2.4e18), but considerably less so for a
cascade of two to four steps (2! and 4! being 2 and 12, respectively).
So an enzyme cascade of four enzymes would, by your logic, have a
twelve times higher probability than a structure in which a rigid
"right order" matters.

If you want to argue that the proteins that have to fit together in
some sort of 3d conformation are more specific than those that do not
(hence more constrained in sequence, and hence rarer in sequence
space), you should represent that simply by assigning them lower
individual sequence densities (say, 1e-40 instead of 1e-20), although
the biochemical justification for such an assumption is shaky, as we
see in the examples of hemoglobin and histone.

Or, if you want to argue that "3-D confirmation" requires something
more than just the right structural sequences, you could express that
by requiring additional events above and beyond those coding for the
sequences.

But P(E1*E2) is going to be some function of P(E1) * P(E2) any way you
slice it, cascade or no. The only time one adds probabilities is when
one is calculating the probability of either one event *or* the other
one occuring. This is basic prob/stat, Sean.

> The equivalent size and specificity requirements for a cascading system
> are very close to the measure of the subpart with the greatest size and
> specificity requirements. If you want to get really technical, the
> total requirement is an addition of the sequence space ratio for all
> the parts together.
>
> For example, if each part in a two-part cascading system had a sequence
> space ratio of 1e-40, then the total ratio for the entire system would
> be 1e-40 + 1e-40 = 2e-40.

Nope. The probability of two independent events occuring together is
the product of their occuring individually, or P(E1) * P(E2). Period.

Now, if you want to say that the order is unimportant, so that E1E2 is
just as good as E2E1, it becomes P(E1) * P(E2) * 2! = 2e-80. What you
just modelled is the probability of either E1 *or* E2 occuring in a
given trial (actually, this would be 2e-40 - 1e160, assuming that any
overlap is random, but that's not a significant difference here).

>
> However, if each part of a two-part system which required specific
> orientation of the parts relative to each other had a sequence space
> ratio of 1e-40, then the total ratio for the entire system would be
> 1e-40 * 1e-40 = 1e-80.
>
> See the difference?

Yes, the difference is that I still remember some of the basics from my
prob/stat courses and you, apparently, do not.

Learn some math, Sean.

Seanpit

unread,

Jul 24, 2006, 12:32:27 AM7/24/06

to

Von R. Smith wrote:
> Seanpit wrote:
> > Von R. Smith wrote:
> >
>
> < snip >
>
>
> > >
> > > I am not asking for a comparison, I am asking for a count. If being
> > > "fairly specified" is actually a property of the sequence in question,
> > > I should be able to determine it without such comparisons, simply by
> > > estimating the minimum size and degree of sequence constraint of the
> > > primary structure.
> >
> > Your mistaken in estimating specificity is in thinking that parts that
> > do not require specific orientation will all the other parts in a
> > system can simply be added up to produce a total. This isn't true.
> > You can only add up the numbers like you do if the parts do have this
> > additional specificity requirement. What you are actually doing in
> > this "addition" is not adding at all. You are actually multiplying.
> >
> > For example, what is 10^20 * 10^20? Isn't the answer 10^40? In other
> > words, you are multiplying the numbers, which is done by addition of
> > the exponents. What happens with cascading systems is that you can't
> > multiply the individual numbers. You can only add them together: 10^20
> > + 10^20 = 20^20.
>

> I think you mean "1e20 + 1e20 = 2e20". Big difference, you know. :)

It really doesn't make a hill of beans difference as far as the problem
at hand. You, like John, must know what I meant. You're like the
spelling flames. I'm was typing fast, but the concept is still the
same. (1 x 10^20) + (1 x 10^20) = 2 x 10^20. Or, 1e20 + 1e20 = 2e20.
Whatever . . .

> > That is a far different number - right? As far as a
> > measure of specificity is concerned, this number is not nearly as
> > "specified" as the 10^40 number. In other words, there is a far greater
> > ratio of potentially workable sequences on the one hand than on the
> > other.
>
> So, this is it? The basis of your cascade rule is the sort of math
> mistake a C-student might make in his first week of an intro prob/stat
> course?
>
> I have seen Sean say some incompetent things about probabilities
> before, but this one takes the cake. This isn't even wrong; it is
> Zoemath written at a higher reading-level.
>
> The probability of two independent events occuring is the product of
> the probabilities of each one occuring. Period. It doesn't matter
> whether you're talking about enzymes in a cascade, structural proteins

> in a flagellum, or lottery numbers.

The probabilities of two independent events occurring is the sum of the
probabilities of each one occurring - not the product.

For example, let's say that we have a box of 1 million marbles with
only 1 of them being black. If each marble has equal chances of being
picked, on average, how many times will I have to draw out a marble,
and put it back in random order, before I pick the black one? 1
million - right? How many times before I pick the black marble twice?
By your logic I'd have to draw out the marbles 1e6 * 1e6 = 1e12 or 1
trillion times before I'd pull out that black marble twice. That's
simply not true. In reality, it would only take me an average of only 2
million picks to land on the black marble twice.

What you are trying to calculate is the odds of me picking the black
marble twice in a row. The odds of that happening are indeed 1e-12.
However, that isn't the correct formula to answer the question at hand.

> If you want to argue that the

> sequence in which the proteins are coded are unimportant for a cascade
> (also a bit questionable), than you can multiply that product by N!,

> with N being the number of steps being coded for. That would be a
> pretty big swing for something requiring 20 steps (20! is approximately
> 2.4e18), but considerably less so for a cascade of two to four steps
> (2! and 4! being 2 and 12, respectively). So an enzyme cascade of four

> enzymes would have a twelve times higher probability than a structure
> in which a rigid "right order" matters. That might help you a little
> bit, but not very much, and it certainly doesn't justify your cascade
> rule. It is the diference of perhaps a single "fairly-specified" codon
> or two at most.

Again, it seems to me that you are not thinking about the problem
correctly. As I originally described to you, it is far easier to find
all the parts of your multipart system in a given amount of time
because they all exist in a much much smaller sequence space.

> If you want to argue that the proteins that have to fit together in

> some sort of 3d conformation are more specific (hence more constrained

> in sequence, and hence rarer in sequence space), you should represent
> that simply by assigning them lower individual sequence densities (say,
> 1e-40 instead of 1e-20), although the biochemical justification for
> such an assumption is shaky, as we see in the examples of hemoglobin
> and histone.

Ah, but that's not my argument. I'm arguing that given the same
individual sequence specificity per protein part, that a cascading
system has a much much lower overall degree of specificity than a
system that requires all the parts to work together where each protein
has a specific location relative to all the other protein parts.

> Or, if you want to argue that "3-D confirmation" requires something
> more than just the right structural sequences, you could express that
> by requiring additional events above and beyond those coding for the
> sequences.

I'm not making that argument either - and I don't need to in order to
be correct.

> But P(E1*E2) is going to be some function of P(E1) * P(E2) any way you
> slice it, cascade or no. The only time one adds probabilities is when
> one is calculating the probability of either one event *or* the other
> one occuring. This is basic prob/stat, Sean.

You need to consider the ratio here - the odds that the needed
sequences will actually exist in a given search space. Remember also,
you have a great many searchers searching out the sequence space at the
same time - many more than 4 or 5 searchers. So, if all the sequences
exist in the same given sequence space, they will all be found at about
the same time (i.e., 5 black marbles will be found by 1000 searchers at
about the same time, on average).

Again, as I pointed out to you before, you need to ask yourself, what

are the odds that two specific 3-character sequences (26-possible
characters per position) will be present in a long string of say, 1
billion randomly typed 3-character sequences (3 billion characters
total)? The total number of possibilities is 17,576 (sequence space
for 3-character sequences). The odds that a particular 3-character
sequence will not be anywhere in this sequence is (17,575/17,576)^1e9 =
6e-24,711. So, the odds that 1 specific 3-character sequence will be
present in our sequence is very good at 1-6e-24,711 = very very near
100%. So, what are the odds that 2 specific 3-character sequences will
be there? Also very close to 100%. (i.e., 0.99999 . . . * 0.999999 .
. . ).

Now, let me add something to this that I thought was obvious before,
but I guess not. The amount of time needed to find both of these
sequences will be pretty much the same amount of time needed to find
just one of these sequences - since there are many mutations throughout
the genome happening concurrently. And, even if there were only one
searcher, the time needed for one searcher to find both sequences would
be 2 * the time needed to find one string. The time needed is not
multiplied - it is added. If it takes an average time of 1 year to
find a 3-character sequence, it will take an average of just 2 years
for one searcher to find both of the correct 3-character sequences.

> > The equivalent size and specificity requirements for a cascading system
> > are very close to the measure of the subpart with the greatest size and
> > specificity requirements. If you want to get really technical, the
> > total requirement is an addition of the sequence space ratio for all
> > the parts together.
> >
> > For example, if each part in a two-part cascading system had a sequence
> > space ratio of 1e-40, then the total ratio for the entire system would
> > be 1e-40 + 1e-40 = 2e-40.
>
> Nope. The probability of two independent events occuring together is
> the product of their occuring individually, or P(E1) * P(E2). Period.

Nope. You're actually thinking about two independent events occurring
in a row.

> > However, if each part of a two-part system which required specific
> > orientation of the parts relative to each other had a sequence space
> > ratio of 1e-40, then the total ratio for the entire system would be
> > 1e-40 * 1e-40 = 1e-80.
> >
> > See the difference?
>
> Yes, the difference is that I still remember some of the basics from my
> prob/stat courses and you, apparently, do not.
>
> Learn some math, Sean.

; )

Sean Pitman
www.DetectingDesign.com

Richard Forrest

unread,

Jul 24, 2006, 3:33:44 AM7/24/06

to

Seanpit wrote:
<snipped>

>
> The probabilities of two independent events occurring is the sum of the
> probabilities of each one occurring - not the product.
>

For crying out loud, Sean, learn some basic mathematics.

You are making an argument based on probabilities, yet are so freaking
ignorant about the mathematics of probability that you make an
assertion as stupid as this?

Get real.

RF

Augray

unread,

Jul 24, 2006, 7:32:48 AM7/24/06

to

On 24 Jul 2006 00:33:44 -0700, "Richard Forrest"
<ric...@plesiosaur.com> wrote in
<1153726424....@m79g2000cwm.googlegroups.com> :

Yes, this is jaw-dropping stuff. What does Sean believe the
probability is of two independent, yet certain, events occurring to
be? 2 in 1?

>RF
>
><snipped>

Von R. Smith

unread,

Jul 24, 2006, 10:08:25 AM7/24/06

to

Rather than write a long response detailing what is wrong with this, I
have a better idea. Sean, you have math instructors at Loma Linda,
right? Why don't you ask them if your math claims above make any
sense. Let us know what they say. And I don't mean vague
characterization, either; I want to know exactly what you asked them
and exactly what they said. Math doesn't paraphrase well.

Seanpit

unread,

Jul 24, 2006, 11:23:12 AM7/24/06

to

Von, you need some help in understanding how to apply statistical
formulas. Go and ask one of your local math or statistics professors
for some help on this one.

Until then, let's try to make it even more simple for you. How many
times would you have to role a dice to get "6"? You'd have to roll the
dice 6 times, on average. Are you with me so far? Now, how many times
would you have to roll the dice, on average, to get 6 twice? You'd
have to roll the dice 12 times. That's right, only 12 times. Do the
experiment yourself Von. If you roll the dice 12 times, on average,
you will roll the number six twice per set of 12 rolls.

What you are trying to calculate is the number of rolls needed to roll
six twice in a row. The answer to that question is 36 rolls. That's
not the question we are trying to answer in this case Von. The
question at hand is how many rolls will it take to get two sixes. It
makes a big difference.

Sean Pitman
www.DetectingDesign.com

Seanpit

unread,

Jul 24, 2006, 11:26:18 AM7/24/06

to

The odds that they will occur in a given set of a sum of their
independent odds is 50:50. For example, how many times would you have
to role a dice to get the side with six dots on it? You'd have to roll

the dice 6 times, on average. Are you with me so far? Now, how many
times would you have to roll the dice, on average, to get "6" twice?
You'd have to roll the dice 12 times. That's right, only 12 times. Do

the experiment yourself and you will see that this is true. If you

roll the dice 12 times, on average, you will roll the number six twice
per set of 12 rolls.

> >RF
> >
> ><snipped>

Seanpit

unread,

Jul 24, 2006, 11:31:50 AM7/24/06

to

Oh really? If I am so far off base, Richard, please do tell me how
many picks I'd have to make, on average, before I draw out the one
black marble from 1 million marbles two times? Or, how many times would
I have to play the California lottery to win it twice with odds of 1 in
250 million?

>
> RF
>
> <snipped>

Von R. Smith

unread,

Jul 24, 2006, 12:06:22 PM7/24/06

to

Sean, this is correct, but it doesn't model anything relevant.

You aren't even describing the probability of two independent events.
You are describing the probability of a single event occuring r times
in N trials (if you've taken any prob/stat, as I have, you would
understand the difference). Get back to me after you've heard back
from your math faculty.

Ken Denny

unread,

Jul 24, 2006, 12:25:25 PM7/24/06

to

Seanpit wrote:
>
> For example, how many times would you have
> to role a dice to get the side with six dots on it? You'd have to roll
> the dice 6 times, on average. Are you with me so far?

No. I'm not with you. I only have to roll the die between 3 and 4 times
on average to get the side with six dots. The probability of getting a
6 on one of the first three rolls is .422. The probability of getting
it by the fourth roll is .519.

B Richardson

unread,

Jul 24, 2006, 12:20:55 PM7/24/06

to

Are you talking about double sixes or a single die?

Richard Forrest

unread,

Jul 24, 2006, 12:34:08 PM7/24/06

to

That's not what you were asserting.

Learn some basic statistics.

RF

>
> >
> > RF
> >
> > <snipped>

Dave...@aol.com

unread,

Jul 24, 2006, 12:49:50 PM7/24/06

to

Anyone with a shred of sense would be able to find out that the
probability of
two independent events occurring absolutely has to be the product,
rather than
the sum.

Take the die rolling example (standard 6-sided and a fair die).

The odds of not roling a six on any given throw is 5/6.

By your argument, the odds of not rolling a six on two throws would be
10/6,
since you are using the sum, not the product. But since probability of
any
event cannot be greater than 1, we are left with the notion that your
proposition is false.

And if you don't know basic combinatorics (and this is stuff that is
learned
in high-school level math classes!), how we can take you at your word
on
anything mathematical?

And to follow:

No probabilty combinations use the sum strictly.

Event A AND Event B occurring use P(A)*P(B).

Event A OR Event B occurring use P(A)+P(B)-(P(A)*P(B)). (The last term
is
to prevent double-counting.)

Dave...@aol.com

unread,

Jul 24, 2006, 1:22:24 PM7/24/06

to

To have a 50% chance of having drawn out the black marble once, you'd
be
looking at roughly 693,000 attempts, not 1 million. (After 1 million
attempts,
it's roughly 64% that you've drawn at least one out.)

Twice would require some extra calculations that I don't have the time
for
right now.

Desertphile

unread,

Jul 24, 2006, 3:51:33 PM7/24/06

to

Seanpit wrote:

> Richard Forrest wrote:

> > Seanpit wrote:
> > <snipped>

A half-dozen people, who have consistantly demonstrated to you that
they are your intellectual superiors, told you that you are incorrect,
and whaddyah do? You wailed "is NOT!" and defended your error, claiming
it is not an error.

Your reply should have been "Thank you, I see that I am in error and I
will strive to not repeat the mistake."

Desertphile

unread,

Jul 24, 2006, 3:53:17 PM7/24/06

to

Seanpit wrote:

> Or, how many times would
> I have to play the California lottery to win it twice with odds of 1 in
> 250 million?

P.S. Eight billion times.

Augray

unread,

Jul 24, 2006, 4:08:41 PM7/24/06

to

On 24 Jul 2006 08:26:18 -0700, "Seanpit"

<seanpi...@naturalselection.0catch.com> wrote in
<1153754778.5...@i42g2000cwa.googlegroups.com> :

>
>Augray wrote:
>> On 24 Jul 2006 00:33:44 -0700, "Richard Forrest"
>> <ric...@plesiosaur.com> wrote in
>> <1153726424....@m79g2000cwm.googlegroups.com> :
>>
>> >
>> >Seanpit wrote:
>> ><snipped>
>> >>
>> >> The probabilities of two independent events occurring is the sum of the
>> >> probabilities of each one occurring - not the product.
>> >>
>> >
>> >For crying out loud, Sean, learn some basic mathematics.
>> >
>> >You are making an argument based on probabilities, yet are so freaking
>> >ignorant about the mathematics of probability that you make an
>> >assertion as stupid as this?
>> >
>> >Get real.
>>
>> Yes, this is jaw-dropping stuff. What does Sean believe the
>> probability is of two independent, yet certain, events occurring to
>> be? 2 in 1?
>
>The odds that they will occur in a given set of a sum of their
>independent odds is 50:50.

Please note that I used the term "certain" to describe these events.
Since the odds of such an event happening is 1 chance in 1, can I
therefore conclude that the chance of two certain events happening is
2 in 1?

>For example, how many times would you have
>to role a dice to get the side with six dots on it? You'd have to roll
>the dice 6 times, on average.

No, that's just wrong. Have you never rolled a six on the first try?

>Are you with me so far?

Nope.

>Now, how many
>times would you have to roll the dice, on average, to get "6" twice?
>You'd have to roll the dice 12 times. That's right, only 12 times.

No, that's not true either.

>Do
>the experiment yourself and you will see that this is true. If you
>roll the dice 12 times, on average, you will roll the number six twice
>per set of 12 rolls.

That's a different experiment.

>> >RF
>> >
>> ><snipped>

Von R. Smith

unread,

Jul 24, 2006, 4:15:03 PM7/24/06

to

Note: Actually, as others have pointed out, this isn't actually
correct, either, but that isn't really the main problem with your
argument. Besides, I was too lazy to figure up a bunch of factorials.

Von R. Smith

unread,

Jul 24, 2006, 4:27:32 PM7/24/06

to

Repeating mistakes is all he has left to defend his position. Let's
recap:

10^10 + 10^10 = 20^20

The probability of two independent events occuring is the sum of the
probabilites of each occuring.

"Two independend events" = "Two occurences of the same event in a
series of trials".

1/6 + 1/6 = 1/12 (I especially liked this one, even if he didn't quite
come out and say it in as many words).

Greg Guarino

unread,

Jul 24, 2006, 5:06:57 PM7/24/06

to

On 24 Jul 2006 08:31:50 -0700, "Seanpit"
<seanpi...@naturalselection.0catch.com> wrote:

Leaving aside the actual math here, try this thought experiment. Would
you say that the probability of winning the lottery TWICE is greater
than or less than the probability of winning it ONCE? Common sense
says that winning twice would be quite a bit rarer, correct? So it
can't be the sum.

Greg Guarino

Gerry Murphy

unread,

Jul 24, 2006, 6:19:49 PM7/24/06

to

"Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message
news:1153715547.8...@i3g2000cwc.googlegroups.com...
>
<snip>

> The probabilities of two independent events occurring is the sum of the
> probabilities of each one occurring - not the product.

Sadly it seems some people really are as stupid as they seem.

Why is it that so many of the nutjobs on T.O. are any or all of innumerate,
illiterate or ignorant?

<snip>

Gerry Murphy

unread,

Jul 24, 2006, 6:24:54 PM7/24/06

to

"Seanpit" <seanpi...@naturalselection.0catch.com> manifested his
innumeracy in message
news:1153754778.5...@i42g2000cwa.googlegroups.com...

<snip>

> The odds that they will occur in a given set of a sum of their
> independent odds is 50:50. For example, how many times would you have
> to role a dice to get the side with six dots on it? You'd have to roll
> the dice 6 times, on average.

No, you have to roll it just under 4 times, on average.

You really should stop embarrassing yourself with this display of
breathtaking ignorance.

Seanpit

unread,

Jul 24, 2006, 7:34:22 PM7/24/06

to

If you are talking about the same person winning the lottery twice in a
row, then you'd be correct. If you are talking about how many tries it
would take the same person to win the lottery twice, on average, then
it would be twice the odds of winning the lottery once.

> Greg Guarino

Seanpit

unread,

Jul 24, 2006, 7:31:21 PM7/24/06

to

Yes, it is what I am asserting. What did you think I was asserting?

> Learn some basic statistics.

Learn how to apply basic statistics . . .

>
> RF
>
> >
> > >
> > > RF
> > >
> > > <snipped>

bullpup

unread,

Jul 24, 2006, 7:48:56 PM7/24/06

to

"Gerry Murphy" <gerry_...@comcast.net> wrote in message
news:BMqdnaxqt4UK21jZ...@giganews.com...

If they weren't, they wouldn't be nutjobs.

Boikat
--
"I reject your reality, and substitute my own"
-Adam Savage, Mythbusters-

Seanpit

unread,

Jul 24, 2006, 8:11:16 PM7/24/06

to

That's true! But, I'm talking about the mean average here - not the
median. The mean average is 6 rolls of the dice before a six is
rolled. In other words, you don't stop rolling the dice after 6 rolls.
You keep rolling until you roll a six. The mean average of all the
rolls taken to land on 6 will be 6 rolls.

The mean average is sum(n*pr(n)), i.e., 1*pr(1)+2*pr(2)+ ... = 2. If
we are talking about equally probable trials the average number of
trials before a success is 1/pr(success).

Sean Pitman
www.DetectingDesign.com

Seanpit

unread,

Jul 24, 2006, 8:11:18 PM7/24/06

to

That's true! But, I'm talking about the mean average here - not the

Seanpit

unread,

Jul 24, 2006, 8:12:30 PM7/24/06

to

What is the mean average number of rolls?

Sean Pitman
www.DetectingDesign.com

Seanpit

unread,

Jul 24, 2006, 8:22:54 PM7/24/06

to

A single die and the mean number of rolls it would take to get a six
rolled.

Seanpit

unread,

Jul 24, 2006, 8:21:32 PM7/24/06

to

That's not the question I'm asking here.

> >For example, how many times would you have
> >to role a dice to get the side with six dots on it? You'd have to roll
> >the dice 6 times, on average.
>
> No, that's just wrong. Have you never rolled a six on the first try?

Sure, but it is also possible to fail to roll a six even after a
million tries. This puts the overall odds, the mean average, at 6
rolls to get a 6.

> >Are you with me so far?
>
> Nope.

What is the mean average number of rolls of a dice needed to get a six?

> >Now, how many
> >times would you have to roll the dice, on average, to get "6" twice?
> >You'd have to roll the dice 12 times. That's right, only 12 times.
>
> No, that's not true either.

What's the mean average then?

> >Do
> >the experiment yourself and you will see that this is true. If you
> >roll the dice 12 times, on average, you will roll the number six twice
> >per set of 12 rolls.
>
> That's a different experiment.

This is what this entire discussion is about - -

> >> >RF
> >> >
> >> ><snipped>

snex

unread,

Jul 24, 2006, 8:33:08 PM7/24/06

to

the probability that you will rolls 6 after N trials can be calculated
by the following recursive formula:

P(N) = (5/6)^(N-1) * P(N-1)

where P(1) = 1/6.

by solving for 0.5 (the point at which 50% of the time you would have
rolled a 6), you get slightly under 4, as stated by gerry.

>
> Sean Pitman
> www.DetectingDesign.com

snex

unread,

Jul 24, 2006, 8:40:42 PM7/24/06

to

snex wrote:
> Seanpit wrote:
> > Gerry Murphy wrote:
> > > "Seanpit" <seanpi...@naturalselection.0catch.com> manifested his
> > > innumeracy in message
> > > news:1153754778.5...@i42g2000cwa.googlegroups.com...
> > >
> > > <snip>
> > > > The odds that they will occur in a given set of a sum of their
> > > > independent odds is 50:50. For example, how many times would you have
> > > > to role a dice to get the side with six dots on it? You'd have to roll
> > > > the dice 6 times, on average.
> > >
> > > No, you have to roll it just under 4 times, on average.
> > >
> > > You really should stop embarrassing yourself with this display of
> > > breathtaking ignorance.
> >
> > What is the mean average number of rolls?
>
> the probability that you will rolls 6 after N trials can be calculated
> by the following recursive formula:
>
> P(N) = (5/6)^(N-1) * P(N-1)

correction: P(N) = (1/6)*((5/6)^(N-1)) + P(N-1)

i hate copying math from paper where it looks nice into a computer
where i have to make it conform to ascii. :/

Seanpit

unread,

Jul 24, 2006, 8:45:55 PM7/24/06

to

Dave...@aol.com wrote:
> Seanpit wrote:
> > Richard Forrest wrote:
> > > Seanpit wrote:
> > > <snipped>
> > > >
> > > > The probabilities of two independent events occurring is the sum of the
> > > > probabilities of each one occurring - not the product.
> > > >
> > >
> > > For crying out loud, Sean, learn some basic mathematics.
> > >
> > > You are making an argument based on probabilities, yet are so freaking
> > > ignorant about the mathematics of probability that you make an
> > > assertion as stupid as this?
> > > Get real.
> >
> > Oh really? If I am so far off base, Richard, please do tell me how
> > many picks I'd have to make, on average, before I draw out the one
> > black marble from 1 million marbles two times? Or, how many times would
> > I have to play the California lottery to win it twice with odds of 1 in
> > 250 million?
>
> Anyone with a shred of sense would
> be able to find out that the
> probability of two independent events
> occurring absolutely has to be the product,
> rather than the sum.
>
> Take the die rolling example (standard 6-sided and a fair die).
> The odds of not roling a six on any given throw is 5/6.

Right . . .

> By your argument, the odds of not
>rolling a six on two throws would be
> 10/6, since you are using the sum, not
> the product. But since probability of
> any event cannot be greater than 1,
> we are left with the notion that your
> proposition is false.

You are talking about two back-to-back throws. I'm not. I'm talking
about rolling a six at any time in a series of throws - or two sixes at
any time in a series of throws. The mean average is the sum(n*pr(n)).
If we are talking about equally probably trials the mean average number
of trials before a success is 1/pr(success).

> And if you don't know basic
> combinatorics (and this is stuff that is
> learned in high-school level math classes!),
> how we can take you at your word
> on anything mathematical?

I've yet to see you explain how I'm wrong in my examples . . .

> And to follow:
>
> No probabilty combinations use the sum strictly.
>
> Event A AND Event B occurring use P(A)*P(B).
>
> Event A OR Event B occurring use P(A)+P(B)-(P(A)*P(B)). (The last term
> is to prevent double-counting.)

Again, the mean average number of equally probable trials before
success is 1/prob of success. And, the mean average number of trials
before success for 2 successes is (1/pr success) + (1/pr success).

Sean Pitman
www.DetectingDesign.com

Seanpit

unread,

Jul 24, 2006, 8:48:14 PM7/24/06

to

Isn't that the median average rather than the mean?

> > > Sean Pitman
> > > www.DetectingDesign.com

Seanpit

unread,

Jul 24, 2006, 8:51:56 PM7/24/06

to

What is the mean average number of draws I'd have to make? Aren't you
talking about the median average here?

Seanpit

unread,

Jul 24, 2006, 9:17:02 PM7/24/06

to

> 10^10 + 10^10 = 20^20 [actually wrote 20^10]

Yeah yeah - It is clear from the rest of the post that I meant 1e10 +
1e10 = 2e10. Brain freeze or something here?

> The probability of two independent
> events occuring is the sum of the
> probabilites of each occuring.

Again, the mean average number of equally probable trials before

success is 1/prob of success. And, the mean average number of trials
before success for 2 successes is (1/pr success) + (1/pr success).

> "Two independend events" = "Two

> occurences of the same event in a
> series of trials".

I never said this. However, two independent events may have the same
odds of success - as with an attempt to pull a black marble out of a
box of 1 million marbles were only 1 in black twice. The number of
picks to success, each time, can be thought of as an independent event.
The number of picks needed to realize both independent events, on mean
average, is (1/1e-6) + (1/1e-6) = 2e6.

> 1/6 + 1/6 = 1/12 (I especially liked this
> one, even if he didn't quite
> come out and say it in as many words).

I never said this either. What I said was the average number of
attempts needed to achieve success, the mean average, is the sum of the
number of attempts needed to achieve success for each of the parts of
your protein system. Relatively speaking this total number is very
close to the number of attempts needed to achieve success for the most
complex subpart in your cascading system - and nowhere near the number
of attempts needed, on mean average, to find a sequence that could give
rise to a system of function were each subpart must be specifically
oriented relative to all the other subparts in space.

Sean Pitman
www.DetectingDesign.com

Von R. Smith

unread,

Jul 24, 2006, 9:20:36 PM7/24/06

to

Right, which contradicts the statement that the probability of two
success is p(x) + p(x). If this probability were actually 2*p(x), as
you have repeatedly claimed, then the mean number of trials for two
successes would be half of (1/pr success), not double it. That would
mean that, for example, while you would need an average of 4 rolls to
get your first six, you would only need an average of only 2 or 3 rolls
to get two sixes, or that you would need to play the California lottery
approximately 125 million times to win it twice. It sounds completely
absurd. It *is* completely absurd. And unfortunately it is what you
are claiming when you say that the probability of two successes is the
sum of the probabilities of two individual successes.

It also has nothing to do with the probability of two independent
events occuring together.

Seanpit

unread,

Jul 24, 2006, 9:22:46 PM7/24/06

to

It does model what I'm talking about very well. It blows your notion
that cascading systems are even remotely comparable to the specificity
required by functional systems where each part of the system must be
specifically oriented with all the other parts in the system.

> Note: Actually, as others have pointed out, this isn't actually
> correct, either, but that isn't really the main problem with your
> argument. Besides, I was too lazy to figure up a bunch of factorials.

Factorials have nothing to do with figuring the mean averages here.
Those who are trying to tell me I'm so far off base with my numbers are
actually arguing for a median average - and are completely forgetting
about the mean.

Sean Pitman
www.DetectingDesign.com

snex

unread,

Jul 24, 2006, 9:39:02 PM7/24/06

to

no, and i dont even need to delve into the mathematics to explain why.

first of all, there is no median, because the domain is infinite.
medians only exist when there are a finite number of trials, but when
determining how many rolls it takes to get a 6, you could theoretically
roll forever and never get one.

secondly, the domain consists only of integers, and the average is
~3.80178402, which is not an integer. the only average of an integer
set that can be a non-integer is the mean.

if you want to delve into the math, youll have to convert my recursive
formula into a geometric series, and then use the mean value theorem
integral. my bet is that youll get ~3.80178402.

if you still arent convinced, i can write a program that lets you
choose the number of trials and tells you the mean, median, and mode
for the number of rolls required to roll a 6.

>
> > > > Sean Pitman
> > > > www.DetectingDesign.com

Von R. Smith

unread,

Jul 24, 2006, 9:55:20 PM7/24/06

to

It was your mistake. You tell me.

>
> > The probability of two independent
> > events occuring is the sum of the
> > probabilites of each occuring.
>
> Again, the mean average number of equally probable trials before
> success is 1/prob of success. And, the mean average number of trials
> before success for 2 successes is (1/pr success) + (1/pr success).

Which directly contradicts your claim about summing the probabilities
of the individual successes. If the probability of the two events were
the sum of the probabilities of two individual events, then the mean
number of trials for two success should be half of (1/pr success), not
double it. For example, the probability of rolling a 6 on a die is
1/6. 1/6 + 1/6 = 1/3. This according to you, would be the
probability of rolling two sixes. It would then take me on average
only two or three trials to get two sixes, rather than 12.

>
> > "Two independend events" = "Two
> > occurences of the same event in a
> > series of trials".
>
> I never said this.

Sure you did. You did it again in the paragraph just above. In trying
to make a point about the probability of two independent events, you
modeled two occurences of the same event in a series of trials. That's
only justified if they are mathematically equivalent. Hence the = sign
between them.

> However, two independent events may have the same
> odds of success - as with an attempt to pull a black marble out of a
> box of 1 million marbles were only 1 in black twice. The number of
> picks to success, each time, can be thought of as an independent event.
> The number of picks needed to realize both independent events, on mean
> average, is (1/1e-6) + (1/1e-6) = 2e6.

Right. So you are *knowingly* equating apples and oranges. And this
makes your argument less stupid how?

>
> > 1/6 + 1/6 = 1/12 (I especially liked this
> > one, even if he didn't quite
> > come out and say it in as many words).
>
> I never said this either.

Sure you did.

- You claimed that the probability of two independent events is the sum
of the probabilities of each event.

- You obviously think that the average number of trials needed to
obtain a result is the reciprocal of its probability.

- You pointed out that it takes an average of 12 trials to get two
sixes. The reciprocal of which is 1/12. Which according to you is the
probability of two independent events each with a probability of 1/6.

- Hence, you said that 1/6 + 1/6 = 1/12.

You may not have realized you were saying this at the time. You might
not like the fact that you said it now that you realize how stupid it
is. But you did say this.

> What I said was the average number of
> attempts needed to achieve success, the mean average, is the sum of the
> number of attempts needed to achieve success for each of the parts of
> your protein system.

> Relatively speaking this total number is very
> close to the number of attempts needed to achieve success for the most
> complex subpart in your cascading system - and nowhere near the number
> of attempts needed, on mean average, to find a sequence that could give
> rise to a system of function were each subpart must be specifically
> oriented relative to all the other subparts in space.

That follows if and only if "two independent events" = "two occurences
of the same event in a series of trials". Which you insist you never
said.

Seanpit

unread,

Jul 24, 2006, 10:00:23 PM7/24/06

to

You are the one trying to argue that the number of trials needed for
success of independent events is (1/pr success) * (1/pr success) -
which is mistaken. You are multiplying when you should be adding.

Again, a 12aa function, where all the residues are fully specified
relative to each other is not remotely in the same ballpark as a
function that requires two 6aa residues that are also fully specified
within each protein, but without the need for the proteins themselves
to be specified relative to each other.

The mean average number of novel 12aa sequences that must be searched
through before success is realized is 20^12 = 4e15. The mean average
number of sequences that must be searched through before success is
realized for a 6aa residue is only 6.4e7. The mean average number of
sequences that must be searched through before two specific 6aa
residues are discovered is 6.4e7 + 6.4e7 = 1.28e8 (not even remotely
close to 4e15).

> If this probability were actually 2*p(x), as
> you have repeatedly claimed, then the mean number of trials for two
> successes would be half of (1/pr success), not double it.

If that is how you understood what I said, it is certainly not how I
meant it. I apologize for the confusion, but the point remains the
same. The mean number of trials for two successes (each having the
same odds) is the sum of the two - not the product. You don't multiply
the number of trials for each success. You add them.

> That would
> mean that, for example, while you would need an average of 4 rolls to
> get your first six, you would only need an average of only 2 or 3 rolls
> to get two sixes, or that you would need to play the California lottery
> approximately 125 million times to win it twice. It sounds completely
> absurd. It *is* completely absurd. And unfortunately it is what you
> are claiming when you say that the probability of two successes is the
> sum of the probabilities of two individual successes.

Again, the probability of two successes is the sum of the number of
trials needed for each success. You cannot multiply them as you are
evidently attempting to do with your 2,4-DNT example. You add them.

> It also has nothing to do with the probability of two independent
> events occuring together.

Yes, it does. The mean average number of random mutations needed to
find two fully specified 6aa sequences is exponentially lower than the
mean average number needed to find just one fully specified 12aa
sequence.

Same size, all parts fully specified, big difference in overall
specificity.

Sean Pitman
www.DetectingDesign.com

Von R. Smith

unread,

Jul 24, 2006, 10:04:46 PM7/24/06

to

Talk to those math instructors at Loma Linda. Then get back to me.

Augray

unread,

Jul 24, 2006, 10:16:44 PM7/24/06

to

On 24 Jul 2006 17:21:32 -0700, "Seanpit"
<seanpi...@naturalselection.0catch.com> wrote in
<1153786892....@75g2000cwc.googlegroups.com> :

Nevertheless, it's a very illustrative example. Another example is
rolling two dice at a time. The chance of getting two 6s is 1 in 36,
but according to you it wou1d be 2 in 6.

>> >For example, how many times would you have
>> >to role a dice to get the side with six dots on it? You'd have to roll
>> >the dice 6 times, on average.
>>
>> No, that's just wrong. Have you never rolled a six on the first try?
>
>Sure, but it is also possible to fail to roll a six even after a
>million tries. This puts the overall odds, the mean average, at 6
>rolls to get a 6.

Please show your math.

>> >Are you with me so far?
>>
>> Nope.
>
>What is the mean average number of rolls of a dice needed to get a six?

Between 3 and 4.

>> >Now, how many
>> >times would you have to roll the dice, on average, to get "6" twice?
>> >You'd have to roll the dice 12 times. That's right, only 12 times.
>>
>> No, that's not true either.
>
>What's the mean average then?

I suggest you actually do it, and find out.

>> >Do
>> >the experiment yourself and you will see that this is true. If you
>> >roll the dice 12 times, on average, you will roll the number six twice
>> >per set of 12 rolls.
>>
>> That's a different experiment.
>
>This is what this entire discussion is about - -

Then you should make up your mind as to what it's about. First you ask
for the average number of rolls to get 6 twice, and then you ask for
the number of times 6 will appear in runs of 12 rolls each. These are
two different questions.

>> >> >RF
>> >> >
>> >> ><snipped>

Seanpit

unread,

Jul 24, 2006, 11:27:51 PM7/24/06

to

I agree that the median average is around 4, but I'm still not sure
about the mean average - because of the skewed nature of the curve. It
seems to me that the mean wouldn't be the same as the median in this
case. It seems to me like it should be 6.

"The theoretical distribution for large numbers of throws of unbiased
dice has: mode = 1 median = 4 (The mean, not calculated here, is 6.)"

http://www.rsscse.org.uk/pose/level1/book5/notes.htm

> secondly, the domain consists only of integers, and the average is
> ~3.80178402, which is not an integer. the only average of an integer
> set that can be a non-integer is the mean.
>
> if you want to delve into the math, youll have to convert my recursive
> formula into a geometric series, and then use the mean value theorem
> integral. my bet is that youll get ~3.80178402.
>
> if you still arent convinced, i can write a program that lets you
> choose the number of trials and tells you the mean, median, and mode
> for the number of rolls required to roll a 6.

That would be great.

>
> >
> > > > > Sean Pitman
> > > > > www.DetectingDesign.com

snex

unread,

Jul 24, 2006, 11:34:56 PM7/24/06

to

eh, well the program i wrote seems to be agreeing with you. mean keeps
coming up 6 and median 4 (when n-trials is large).

>
> >
> > >
> > > > > > Sean Pitman
> > > > > > www.DetectingDesign.com

Seanpit

unread,

Jul 24, 2006, 11:50:25 PM7/24/06

to

Hmmmm . . . So, I'm not crazy after all? ; )

>
> >
> > >
> > > >
> > > > > > > Sean Pitman
> > > > > > > www.DetectingDesign.com

Seanpit

unread,

Jul 24, 2006, 11:49:29 PM7/24/06

to

Augray wrote:

< snip >

> >> >For example, how many times would you have
> >> >to role a dice to get the side with six dots on it? You'd have to roll
> >> >the dice 6 times, on average.
> >>
> >> No, that's just wrong. Have you never rolled a six on the first try?
> >
> >Sure, but it is also possible to fail to roll a six even after a
> >million tries. This puts the overall odds, the mean average, at 6
> >rolls to get a 6.
>
> Please show your math.

The median number or rolls needed to get a six is just under 4 rolls.
In other words, 50% of rolls will be less and 50% more before six is
realized. However, the 50% of rolls that are more may be a whole lot
more - up to infinity. This results in skewed graph where the mean,
median, and mode are not the same. The mean will have a greater value
than the median - or so it seems to me. In fact, it seems to me that
the mean will be 6 in this case while the mode will be just under 4.

The theoretical distribution for large numbers of throws of unbiased
dice has:mode = 1 median = 4 (The mean, not calculated here, is 6.)

http://www.rsscse.org.uk/pose/level1/book5/notes.htm

> >> >Are you with me so far?

> >>
> >> Nope.
> >
> >What is the mean average number of rolls of a dice needed to get a six?
>
> Between 3 and 4.

Are both the mean and median between 3 and 4?

> >> >Now, how many
> >> >times would you have to roll the dice, on average, to get "6" twice?
> >> >You'd have to roll the dice 12 times. That's right, only 12 times.
> >>
> >> No, that's not true either.
> >
> >What's the mean average then?
>
> I suggest you actually do it, and find out.

> >> >Do
> >> >the experiment yourself and you will see that this is true. If you
> >> >roll the dice 12 times, on average, you will roll the number six twice
> >> >per set of 12 rolls.
> >>
> >> That's a different experiment.
> >
> >This is what this entire discussion is about - -
>
> Then you should make up your mind as to what it's about. First you ask
> for the average number of rolls to get 6 twice, and then you ask for
> the number of times 6 will appear in runs of 12 rolls each. These are
> two different questions.

They are very much related to the notion that two short sequences can
be found in sequence space just as easily as one long sequence of a
size equivalent to the sum of the two short sequences.

> >> >> >RF
> >> >> >
> >> >> ><snipped>

Seanpit

unread,

Jul 25, 2006, 12:03:43 AM7/25/06

to

Desertphile wrote:

> Seanpit wrote:
>
> > Or, how many times would
> > I have to play the California lottery to win it twice with odds of 1 in
> > 250 million?
>

> P.S. Eight billion times.

P.S. Think again. The mean average is 500 million tries to win twice.
The median average is even less.

Von R. Smith

unread,

Jul 25, 2006, 12:47:12 AM7/25/06

to

Sean, show your posts in this thread to the math instructors at Loma
Linda, and see what they have to say. Ask them how *they* calculate
the probability of two independent events occuring. Then get back to
me with what they said. I'd be interested to hear.

Earle Jones

unread,

Jul 25, 2006, 1:16:20 AM7/25/06

to

In article <1153802832....@h48g2000cwc.googlegroups.com>,

*
It should be obvious to you: If the probability of event 1 is 0.6
and the probability of event 2 is 0.7, then the probability of
occurrence of both events is 1.3 -- the mathematics proves it.

earle
*

"If one man can dig a post hole in 60 seconds, then 60 men can dig
a post hole in one second."

--(paraphrasing) Ambrose Bierce

ej
*

Marc

unread,

Jul 25, 2006, 2:09:51 AM7/25/06

to

Von R. Smith wrote:
> Seanpit wrote:

>
> Rather than write a long response detailing what is wrong with this, I
> have a better idea. Sean, you have math instructors at Loma Linda,
> right? Why don't you ask them if your math claims above make any
> sense. Let us know what they say. And I don't mean vague
> characterization, either; I want to know exactly what you asked them
> and exactly what they said. Math doesn't paraphrase well.

Have a look at this blog article

http://scienceblogs.com/goodmath/2006/07/debunking_a_mathematicians_vie.php

I think it touches on some of Sean's approach to evolution.

(signed) marc

.

Gerry Murphy

unread,

Jul 25, 2006, 5:16:54 AM7/25/06

to

"Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message
news:1153788716.3...@p79g2000cwp.googlegroups.com...
>
<snip>

> What is the mean average number of draws I'd have to make?

Where the hell do you get this nomenclature, "mean average"?
Your using the same textbook as nando, aren't you?

David Wilson

unread,

Jul 24, 2006, 2:41:44 PM7/24/06

to

In article <1153715547.8...@i3g2000cwc.googlegroups.com> on
July 23rd in talk.origins "Seanpit"
<seanpi...@naturalselection.0catch.com> wrote:

> ... [snip] ...

>
> > The probability of two independent events occuring is the product of
> > the probabilities of each one occuring. Period. It doesn't matter
> > whether you're talking about enzymes in a cascade, structural proteins
> > in a flagellum, or lottery numbers.
>

> The probabilities of two independent events occurring is the sum of the
> probabilities of each one occurring - not the product.
>

No. In probability theory, the very _definition_ of independence for
two events is that the probability of their conjunction (i.e. the
probability of them both occurring) be the product of their probabilities.
This is one of the most fundamental concepts of elementary probability
theory, and you will find it explained in any textbook on the subject.
For an online account, see the Wikipedia article at:
<http://en.wikipedia.org/wiki/Statistical_independence>

> For example, let's say that we have a box of 1 million marbles with
> only 1 of them being black. If each marble has equal chances of being
> picked, on average, how many times will I have to draw out a marble,
> and put it back in random order, before I pick the black one? 1
> million - right? How many times before I pick the black marble twice?
> By your logic I'd have to draw out the marbles 1e6 * 1e6 = 1e12 or 1

> trillion times before I'd pull out that black marble twice. ...

No. What you're calculating here is _not at all_ the probability of "two
independent events occurring"; it's the _mean_ of the _sum of two random
variables_, a completely different concept.

------------------------------------------------------------------------
David Wilson

SPAMMERS_fingers@WILL_BE_fwi_PROSECUTED_.net.au
(Remove underlines and upper case letters to obtain my email address.

Seanpit

unread,

Jul 25, 2006, 9:37:12 PM7/25/06

to

That's right.. It is this concept that I'm trying to get Von to
understand. The mean number of draws it would take to pull the marble
out twice, in this particular example, are 1e6 + 1e6 = 2e6 draws.

Augray

unread,

Jul 25, 2006, 9:43:59 PM7/25/06

to

On 24 Jul 2006 20:49:29 -0700, "Seanpit"
<seanpi...@naturalselection.0catch.com> wrote in
<1153799369.1...@p79g2000cwp.googlegroups.com> :

>Augray wrote:
>
>< snip >
>
>> >> >For example, how many times would you have
>> >> >to role a dice to get the side with six dots on it? You'd have to roll
>> >> >the dice 6 times, on average.
>> >>
>> >> No, that's just wrong. Have you never rolled a six on the first try?
>> >
>> >Sure, but it is also possible to fail to roll a six even after a
>> >million tries. This puts the overall odds, the mean average, at 6
>> >rolls to get a 6.
>>
>> Please show your math.
>
>The median number or rolls needed to get a six is just under 4 rolls.
>In other words, 50% of rolls will be less and 50% more before six is
>realized. However, the 50% of rolls that are more may be a whole lot
>more - up to infinity.

So much for figuring out the median.

>This results in skewed graph where the mean,
>median, and mode are not the same. The mean will have a greater value
>than the median - or so it seems to me. In fact, it seems to me that
>the mean will be 6 in this case while the mode will be just under 4.
>
>The theoretical distribution for large numbers of throws of unbiased
>dice has:mode = 1

How one Earth can one calculate the mode if one has a random
distribution?

> median = 4 (The mean, not calculated here, is 6.)

So, in other words, you can't actually show your math.

>http://www.rsscse.org.uk/pose/level1/book5/notes.htm
>
>> >> >Are you with me so far?
>> >>
>> >> Nope.
>> >
>> >What is the mean average number of rolls of a dice needed to get a six?
>>
>> Between 3 and 4.
>
>Are both the mean and median between 3 and 4?

I really think you need to figure out what is meant by "mean",
"median" and "average".

>> >> >Now, how many
>> >> >times would you have to roll the dice, on average, to get "6" twice?
>> >> >You'd have to roll the dice 12 times. That's right, only 12 times.
>> >>
>> >> No, that's not true either.
>> >
>> >What's the mean average then?
>>
>> I suggest you actually do it, and find out.
>
>> >> >Do
>> >> >the experiment yourself and you will see that this is true. If you
>> >> >roll the dice 12 times, on average, you will roll the number six twice
>> >> >per set of 12 rolls.
>> >>
>> >> That's a different experiment.
>> >
>> >This is what this entire discussion is about - -
>>
>> Then you should make up your mind as to what it's about. First you ask
>> for the average number of rolls to get 6 twice, and then you ask for
>> the number of times 6 will appear in runs of 12 rolls each. These are
>> two different questions.
>
>They are very much related to the notion that two short sequences can
>be found in sequence space just as easily as one long sequence of a
>size equivalent to the sum of the two short sequences.

Nevertheless, they are two different questions. You should decide
which one you're asking.

>> >> >> >RF
>> >> >> >
>> >> >> ><snipped>

Von R. Smith

unread,

Jul 25, 2006, 10:04:47 PM7/25/06

to

Sean, you claimed the following:

"The probabilities of two independent events occurring is the sum of
the
probabilities of each one occurring - not the product."

Is that statement correct or incorrect?.

Seanpit

unread,

Jul 25, 2006, 10:11:20 PM7/25/06

to

Von R. Smith wrote:

> > > > Again, the mean average number of equally probable trials before
> > > > success is 1/prob of success. And, the mean average number of trials
> > > > before success for 2 successes is (1/pr success) + (1/pr success).
> > >
> > > Right, which contradicts the statement that the probability of two
> > > success is p(x) + p(x).
> >
> > You are the one trying to argue that the number of trials needed for
> > success of independent events is (1/pr success) * (1/pr success) -
> > which is mistaken. You are multiplying when you should be adding.
>
> Sean, show your posts in this thread to the math instructors at Loma
> Linda, and see what they have to say. Ask them how *they* calculate

> the probability of two independent events occurring. Then get back to

> me with what they said. I'd be interested to hear.

Yes, I was wrong in my description of realizing two independent events.
However, my basic concept of the problem in question here was correct.
Yours seems to be mistaken. What I meant to describe is the number of
trials needed to realize two independent events. This number is the sum
the inverse probabilities of realizing each event. Given this, your
notions about the specificity of cascading systems flies out the
window.

You are the one who is basically trying to argue that finding 4 or 5
short sequences is just as hard as finding one long sequence of
equivalent total length. This simply isn't true - as I have been
saying all along.

For example, lets say that a particular cascading system requires 5
absolutely specified residues. Let's say that the sizes of each is:
150aa, 200aa, 300aa, 350aa, and 500aa. How many searches, on mean
average, would it take to find all of them?

(1/1e-195) + (1/1e-260) + (1/1e-390) + (1/1e-455) + (1/1e-650) = ~1e650
searches.

Now, compare this with the mean average number of searches it would
take to find just one sequence of 1500aa (1/1e-1951) = 1e1951. Not
even remotely comparable - right?

Are you starting to see the problem here? As I said originally, the
mean average time needed to find all the enzymes in an enzymatic
cascade is basically the same as the average time needed to find the
single most complex enzyme in the cascade. You argued that the time
needed would be pretty much the same as that needed to find all the
proteins in a system requiring the same total minimum number of
residues even regardless of the fact that this system also requires
each protein part to be specifically specified relative to all the
other protein parts in that system.

As you can see, this simply isn't true. A system that does not require
overall specificity of arrangement of all its parts is not anywhere
near the degree of specificity of one that does. This translates into
a marked increase in the amount of average time needed to evolve the
system that has the overall specificity requirement compared to the one
that does not - even given the same minimum size requirement.

Sean Pitman
www.DetectingDesign.com

Seanpit

unread,

Jul 25, 2006, 10:15:29 PM7/25/06

to

Rather, it is you who needs to fugure out the difference between mean,
median, and mode here. The mode, in this case, is 1, the median just
shy of 4, and the mean is 6. Look it up. And, if you don't believe
me, do the experiment yourself or, better yet, get snex to explain it
to you.

> >> >> >Now, how many
> >> >> >times would you have to roll the dice, on average, to get "6" twice?
> >> >> >You'd have to roll the dice 12 times. That's right, only 12 times.
> >> >>
> >> >> No, that's not true either.
> >> >
> >> >What's the mean average then?
> >>
> >> I suggest you actually do it, and find out.

The answer is 12. That is the mean average number of dice throws it
would take to get six twice.

Seanpit

unread,

Jul 25, 2006, 10:27:36 PM7/25/06

to

Von R. Smith wrote:

> > > > > What you are trying to calculate is the number of rolls needed to roll
> > > > > six twice in a row. The answer to that question is 36 rolls. That's
> > > > > not the question we are trying to answer in this case Von. The
> > > > > question at hand is how many rolls will it take to get two sixes. It
> > > > > makes a big difference.
> > > >
> > > > Sean, this is correct, but it doesn't model anything relevant.
> >
> > It does model what I'm talking about very well. It blows your notion
> > that cascading systems are even remotely comparable to the specificity
> > required by functional systems where each part of the system must be
> > specifically oriented with all the other parts in the system.
>
> Talk to those math instructors at Loma Linda. Then get back to me.

The main question here is, how long will it take to produce a cascading
enzymatic system vs. a system of the same minimum size and individual
specificity of its individual parts that also requires overall
specificity of each part with every other part in the system? That is
the main question here. You are trying to argue that it would be
equally easy and/or difficult to achieve one as well as the other since
they both have the same size and individual specificities of their
individual parts. What you fail to realize is that the additional
specificity requirement, that each part be specifically oriented
relative to every other part in the system, dramatically increases the
overall specificity of a system - making it much much much more
difficult to realize in a given period of time (compared to a cascading
system).

For example, lets say that a particular cascading system requires 5
absolutely specified residues. Let's say that the sizes of each is:
150aa, 200aa, 300aa, 350aa, and 500aa. How many searches, on mean
average, would it take to find all of them?

(1/1e-195) + (1/1e-260) + (1/1e-390) + (1/1e-455) + (1/1e-650) = ~1e650
searches.

Now, compare this with the mean average number of searches it would
take to find just one sequence of 1500aa (1/1e-1951) = 1e1951. Not
even remotely comparable - right?

This is exactly what I said to begin with - that the functional
complexity of a cascading system was pretty much the same as its most
complex subpart.

Sean Pitman
www.DetectingDesign.com

Seanpit

unread,

Jul 25, 2006, 10:35:15 PM7/25/06

to

This statement sure was an unfortunate choice of words on my part. I
knew what I was thinking, but the words obviously didn't come out
right. The basic concept though, the concept I described to you in the
beginning, is correct. The mean average number of trials needed to
gain an entire cascading system is about the same as the mean number
needed to gain the single most complex part of that system.

You are the one who is basically trying to argue that finding 4 or 5
short sequences is just as hard as finding one long sequence of

equivalent total length. This doesn't seem to be true.

For example, lets say that a particular cascading system requires 5
absolutely specified residues. Let's say that the sizes of each is:
150aa, 200aa, 300aa, 350aa, and 500aa. How many searches, on mean
average, would it take to find all of them?

(1/1e-195) + (1/1e-260) + (1/1e-390) + (1/1e-455) + (1/1e-650) = ~1e650
searches.

Now, compare this with the mean average number of searches it would
take to find just one sequence of 1500aa (1/1e-1951) = 1e1951. Not
even remotely comparable - right?

Are you starting to see the problem here? As I said originally, the

Seanpit

unread,

Jul 25, 2006, 10:53:35 PM7/25/06

to

Von R. Smith wrote:

< snip >

> > > The probability of two independent

> > > events occuring is the sum of the
> > > probabilites of each occuring.
> >
> > Again, the mean average number of equally probable trials before
> > success is 1/prob of success. And, the mean average number of trials
> > before success for 2 successes is (1/pr success) + (1/pr success).
>
> Which directly contradicts your claim about summing the probabilities
> of the individual successes.

That's true. But, it also contradicts your claim that cascading systems
of function have the same degree of specificity as functions that
require each individual subpart to be specifically oriented relative to
every other part in the system (i.e., a fully specified system of
function) - which is the main point of this discussion. A cascading
system is therefore exponentially easier to evolve than a fully
specified system.

< snip >

> > > "Two independend events" = "Two
> > > occurences of the same event in a
> > > series of trials".
> >
> > I never said this.
>
> Sure you did. You did it again in the paragraph just above. In trying
> to make a point about the probability of two independent events, you
> modeled two occurences of the same event in a series of trials. That's
> only justified if they are mathematically equivalent. Hence the = sign
> between them.

Not true. The mean number of trials needed to achieve several events
is (1/pr success A) + (1/pr success B) + etc. . . The probabilities
of each event do not have to be equal in order for their inverses to be
added together like this.

> > However, two independent events may have the same
> > odds of success - as with an attempt to pull a black marble out of a
> > box of 1 million marbles were only 1 in black twice. The number of
> > picks to success, each time, can be thought of as an independent event.
> > The number of picks needed to realize both independent events, on mean
> > average, is (1/1e-6) + (1/1e-6) = 2e6.
>
> Right. So you are *knowingly* equating apples and oranges. And this
> makes your argument less stupid how?

How am I equating apples and oranges here?

For example, lets say that a particular cascading system requires 5
absolutely specified residues. Let's say that the sizes of each is:
150aa, 200aa, 300aa, 350aa, and 500aa. How many searches, on mean
average, would it take to find all of them?

(1/1e-195) + (1/1e-260) + (1/1e-390) + (1/1e-455) + (1/1e-650) = ~1e650
searches.

Is this equating apples and oranges?

Now, compare this with the mean average number of searches it would
take to find just one sequence of 1500aa (1/1e-1951) = 1e1951. Not
even remotely comparable - right?

Are you starting to see the problem here? As I said originally, the
mean average time needed to find all the enzymes in an enzymatic
cascade is basically the same as the average time needed to find the
single most complex enzyme in the cascade. You argued that the time
needed would be pretty much the same as that needed to find all the
proteins in a system requiring the same total minimum number of
residues even regardless of the fact that this system also requires
each protein part to be specifically specified relative to all the
other protein parts in that system.

As you can see, this simply isn't true. A system that does not require
overall specificity of arrangement of all its parts is not anywhere
near the degree of specificity of one that does. This translates into
a marked increase in the amount of average time needed to evolve the
system that has the overall specificity requirement compared to the one
that does not - even given the same minimum size requirement.

> > > 1/6 + 1/6 = 1/12 (I especially liked this
> > > one, even if he didn't quite
> > > come out and say it in as many words).
> >
> > I never said this either.
>
> Sure you did.
>
> - You claimed that the probability of two independent events is the sum
> of the probabilities of each event.
>
> - You obviously think that the average number of trials needed to
> obtain a result is the reciprocal of its probability.
>
> - You pointed out that it takes an average of 12 trials to get two
> sixes. The reciprocal of which is 1/12. Which according to you is the
> probability of two independent events each with a probability of 1/6.
>
> - Hence, you said that 1/6 + 1/6 = 1/12.
>
> You may not have realized you were saying this at the time. You might
> not like the fact that you said it now that you realize how stupid it
> is. But you did say this.

The mean average number of rolls of a dice it takes to get 2 sixes is
12 rolls. This is calculated by taking the inverse of the probability
of each event, i.e., 6 in both cases here, and adding them together =
12 rolls of the dice. This is clearly what I did here. How is this
incorrect?

> > What I said was the average number of
> > attempts needed to achieve success, the mean average, is the sum of the
> > number of attempts needed to achieve success for each of the parts of
> > your protein system.
>
> > Relatively speaking this total number is very
> > close to the number of attempts needed to achieve success for the most
> > complex subpart in your cascading system - and nowhere near the number
> > of attempts needed, on mean average, to find a sequence that could give
> > rise to a system of function were each subpart must be specifically
> > oriented relative to all the other subparts in space.
>
> That follows if and only if "two independent events" = "two occurences
> of the same event in a series of trials". Which you insist you never
> said.

I don't understand this to be true. It seems to me that this "follows"
regardless of if "two independent events" = "two occurrences of the
same event in a series of trials" or if the "two independent events"
are not the same and have different odds of success.

Sean Pitman
www.DetectingDesign.com

Seanpit

unread,

Jul 25, 2006, 10:55:57 PM7/25/06

to

There are different types of averages - right? The mean, median, and
mode?

Zachriel

unread,

Jul 26, 2006, 6:47:13 AM7/26/06

to

"Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message

news:1153881314.9...@m73g2000cwd.googlegroups.com...

You are attempting to state the Expected Value. In 1e6 draws, you would
*expect* to pull one black marble. In 2e6 draws, you would *expect* to pull
two black marbles (returning the black marble after each draw).

I haven't read the entire thread, but I assume you are arguing that a
cascade is a ratchet that allows the first black marble to be locked into
place, then in the next 1e6 draws (as it were) would be expected to produce
the second marble ('sum'). And then argue that evolution requires two
directly consecutive (in effect simultaneous) successful draws to cross the
neutral gap ('product').

Ironically, the ratchet is much like how evolution in general is expected to
work. Each small change adding some selectable value.

By the way, Behe and Snoke showed that reasonable so-called neutral gaps are
no problem for a population of prokaryotes to cross in short order just with
point-mutations, even making some very conservative assumptions. And they
admit that other mechanisms (e.g. homologous recombination) may provide much
faster changes.
http://www.proteinscience.org/cgi/reprint/ps.04802904v1.pdf

Meanwhile, Bogarad & Deem say that "DNA base substitution, in the context of
the genetic code, is ideally suited for the generation, diversification, and
optimization of local protein space", and then provide empirical evidence
and an explanatory model of still other natural mechanisms such "that
organization into higher-order fundamental units such as nucleic acids, the
genetic code, secondary and tertiary structure, cellular
compartmentalization, cell types, and germ layers allows systems to escape
complexity barriers and potentiates explosions in diversity".
http://www.pnas.org/cgi/content/full/96/6/2591

Also, keep in mind that ignorance is not evidence, and we can't make a
scientific assertion of an Intelligent Designer every time there is a Gap in
human knowledge.

--
Zachriel's
Word Mutation and Evolution Experiment
And it takes less than "zillions of years"!
http://www.zachriel.com/mutagenation/

Gerry Murphy

unread,

Jul 26, 2006, 8:11:58 AM7/26/06

to

"Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message

news:1153882557.5...@i42g2000cwa.googlegroups.com...

Yes, and that's how we describe them. Tacking on 'average' adds nothing.

Von R. Smith

unread,

Jul 26, 2006, 3:53:35 PM7/26/06

to

Seanpit wrote:
> Von R. Smith wrote:
>
> < snip >
>
> > > > The probability of two independent
> > > > events occuring is the sum of the
> > > > probabilites of each occuring.
> > >
> > > Again, the mean average number of equally probable trials before
> > > success is 1/prob of success. And, the mean average number of trials
> > > before success for 2 successes is (1/pr success) + (1/pr success).
> >
> > Which directly contradicts your claim about summing the probabilities
> > of the individual successes.
>
> That's true. But, it also contradicts your claim that cascading systems
> of function have the same degree of specificity as functions that
> require each individual subpart to be specifically oriented relative to
> every other part in the system (i.e., a fully specified system of
> function) - which is the main point of this discussion.

How does it contradict this, Sean? Your metric of specification (as
you have told me "over and over again") is given by a minimum size
requirement, plus some measurement of sequence constraint, which you
usually give as the proportion of sequences of said minimum size
serving the function in question. I don't see where expected successes
in a number of Bernoulli trials even figure into the picture. More on
this below.

> A cascading
> system is therefore exponentially easier to evolve than a fully
> specified system.
>

Let's test this notion against what you have told me elsewhere. You
have explained to me "over and over again" that specificity is given by
two measurements:

a) A minimum size requirement.
b) A measurement of "specificity", which you typically estimate as the
proportion of all possible sequences of aforementioned minimum size
performing the function in question.

Now, let's consider this in the case of your die rolls.

What is the "minimum size requirement" for coming up with two sixes?
Two rolls.

What is the "specificity" of two sixes (i.e., the proportion of two die
rolls that result in two sixes)= 1/36.

Do your stated criteria identify any other parameters for determining
specificity? No.

Do either of these numbers change if I proceed by rolling one die until
I get a six, then rolling the other die until I get a second six, as
opposed to rolling two dice together until I get boxcars? No.

Does the mean number of trials needed to end up with two sixes change
depending on which selection regime I use? Of course, but that has
nothing to do with specificity as you have defined it "over and over
again", and as described above.

Does any of this tell us anything meaningful about the evolvability of
functions that require a particular "3-D conformation" as opposed to a
cascade? Not as far as I can tell.

The closest I can get to making sense of any of this is to suppose that
you are conceding that cascades involving numerous proteins thousands
of aa long can be built step-by-step through a series of individually
useful intermediates, while trying to suggest that functions requiring
a particular "3D conformation" must proceed via your strawman model of
random-walk-until-everything-is-in-place-all-at-once. But you keep
denying that that is your argument, so you must be trying to get at
something else, although I can't figure out what that might be.

> < snip >
>
> > > > "Two independend events" = "Two
> > > > occurences of the same event in a
> > > > series of trials".
> > >
> > > I never said this.
> >
> > Sure you did. You did it again in the paragraph just above. In trying
> > to make a point about the probability of two independent events, you
> > modeled two occurences of the same event in a series of trials. That's
> > only justified if they are mathematically equivalent. Hence the = sign
> > between them.
>
> Not true. The mean number of trials needed to achieve several events
> is (1/pr success A) + (1/pr success B) + etc. . . The probabilities
> of each event do not have to be equal in order for their inverses to be
> added together like this.

Which does not address what I wrote.

>
> > > However, two independent events may have the same
> > > odds of success - as with an attempt to pull a black marble out of a
> > > box of 1 million marbles were only 1 in black twice. The number of
> > > picks to success, each time, can be thought of as an independent event.
> > > The number of picks needed to realize both independent events, on mean
> > > average, is (1/1e-6) + (1/1e-6) = 2e6.
> >
> > Right. So you are *knowingly* equating apples and oranges. And this
> > makes your argument less stupid how?
>
> How am I equating apples and oranges here?
>
> For example, lets say that a particular cascading system requires 5
> absolutely specified residues. Let's say that the sizes of each is:
> 150aa, 200aa, 300aa, 350aa, and 500aa. How many searches, on mean
> average, would it take to find all of them?
>
> (1/1e-195) + (1/1e-260) + (1/1e-390) + (1/1e-455) + (1/1e-650) = ~1e650
> searches.
>
> Is this equating apples and oranges?
>
> Now, compare this with the mean average number of searches it would
> take to find just one sequence of 1500aa (1/1e-1951) = 1e1951. Not
> even remotely comparable - right?
>
> Are you starting to see the problem here?

Yes, you are now trying to change the subject. You were supposed to be
showing that enzyme cascades only have the same number of
fairly-specified residues as do the largest proteins in them.

By your own stated criteria for measuring specificity, this means that
you have to show me that the proportion of sequences of minimal size
for the entire cascade that actually perform the function of that
cascade is the same as the proportion of the sequences of minimal size
for the largest enzyme that perform the function of that enzyme. I'm
still waiting for a demonstration of this.

So:

1) What proportion of 548aa-long peptide sequences function as a 4M5NC
monoxygenase?

2) What proportion of 1,880aa-long peptide sequences degrade 2,4-DNT
to yield pyruvate?

3) Are your answers to 1) and 2) the same number?

> As I said originally, the
> mean average time needed to find all the enzymes in an enzymatic
> cascade is basically the same as the average time needed to find the
> single most complex enzyme in the cascade. You argued that the time
> needed would be pretty much the same as that needed to find all the
> proteins in a system requiring the same total minimum number of
> residues even regardless of the fact that this system also requires
> each protein part to be specifically specified relative to all the
> other protein parts in that system.

I am not talking about time at all. I am talking about a supposedly
synchronic property: the number of "fairly specified" residues in a
particular cascade, and the number of "fairly specified" base pairs in
the genetic sequences that code for its enzymes. Whether that number,
once calculated, will also say anything meaningful about how long it
takes a particular function to evolve is one of those open issues that
is ever in dispute in our threads.

>
> As you can see, this simply isn't true. A system that does not require
> overall specificity of arrangement of all its parts is not anywhere
> near the degree of specificity of one that does.

The specificity, according to your own rules, is measured by the
proportion of sequences of a hypothetical minimum length that actually
code for/perform the function in question. It is not a measurement of
time. Whether a function of a given specificity could evolve in a
given amount of time was precisely one of the points at issue. You
don't get to rig the discussion by redefining your claims to make them
tautologous.

> This translates into
> a marked increase in the amount of average time needed to evolve the
> system that has the overall specificity requirement compared to the one
> that does not - even given the same minimum size requirement.

Specificity is given by the proportion of sequences of a given size
that perform the function in question, remember? It isn't a
measurement of time. If it were, your whole argument would be nothing
more than an elaborate smokescreen for your incredulity that complex
functions can evolve in the available time.

We can clear this whole thing up if you could just demonstrate why you
think the proportion of 548aa sequences that act as a 4M5NC
monoxygenase is equal to the proportion of 1,888aa sequences that
degrade 2,4-DNT into pyruvic acid.

>
>
> > > > 1/6 + 1/6 = 1/12 (I especially liked this
> > > > one, even if he didn't quite
> > > > come out and say it in as many words).
> > >
> > > I never said this either.
> >
> > Sure you did.
> >
> > - You claimed that the probability of two independent events is the sum
> > of the probabilities of each event.
> >
> > - You obviously think that the average number of trials needed to
> > obtain a result is the reciprocal of its probability.
> >
> > - You pointed out that it takes an average of 12 trials to get two
> > sixes. The reciprocal of which is 1/12. Which according to you is the
> > probability of two independent events each with a probability of 1/6.
> >
> > - Hence, you said that 1/6 + 1/6 = 1/12.
> >
> > You may not have realized you were saying this at the time. You might
> > not like the fact that you said it now that you realize how stupid it
> > is. But you did say this.
>
> The mean average number of rolls of a dice it takes to get 2 sixes is
> 12 rolls. This is calculated by taking the inverse of the probability
> of each event, i.e., 6 in both cases here, and adding them together =
> 12 rolls of the dice. This is clearly what I did here. How is this
> incorrect?

It isn't, but let's not pretend that was all that you claimed.

>
> > > What I said was the average number of
> > > attempts needed to achieve success, the mean average, is the sum of the
> > > number of attempts needed to achieve success for each of the parts of
> > > your protein system.
> >
> > > Relatively speaking this total number is very
> > > close to the number of attempts needed to achieve success for the most
> > > complex subpart in your cascading system - and nowhere near the number
> > > of attempts needed, on mean average, to find a sequence that could give
> > > rise to a system of function were each subpart must be specifically
> > > oriented relative to all the other subparts in space.
> >
> > That follows if and only if "two independent events" = "two occurences
> > of the same event in a series of trials". Which you insist you never
> > said.
>
> I don't understand this to be true. It seems to me that this "follows"
> regardless of if "two independent events" = "two occurrences of the
> same event in a series of trials" or if the "two independent events"
> are not the same and have different odds of success.

If two processes are not the same, then there is no justification for
treating them mathematically as if they were the same.

Von R. Smith

unread,

Jul 26, 2006, 4:16:26 PM7/26/06

to

Seanpit wrote:
> Von R. Smith wrote:
>
> > > > > > What you are trying to calculate is the number of rolls needed to roll
> > > > > > six twice in a row. The answer to that question is 36 rolls. That's
> > > > > > not the question we are trying to answer in this case Von. The
> > > > > > question at hand is how many rolls will it take to get two sixes. It
> > > > > > makes a big difference.
> > > > >
> > > > > Sean, this is correct, but it doesn't model anything relevant.
> > >
> > > It does model what I'm talking about very well. It blows your notion
> > > that cascading systems are even remotely comparable to the specificity
> > > required by functional systems where each part of the system must be
> > > specifically oriented with all the other parts in the system.
> >
> > Talk to those math instructors at Loma Linda. Then get back to me.
>
> The main question here is, how long will it take to produce a cascading
> enzymatic system vs. a system of the same minimum size and individual
> specificity of its individual parts that also requires overall
> specificity of each part with every other part in the system? That is
> the main question here.

No, Sean, the main question is how many of the base pairs in the operon
coding the 2,4-DNT cascade are "fairly specified". This is supposedly
a synchronic property that we can measure in the here and now, and you
have already given parameters for doing this: a minimum size
requirement for a given function, and the proportion of all possible
minimum-sized sequences that perform the function. That you also
believe that this number will tell us something about how long a
particular structure should take to evolve is another matter
altogether.

> You are trying to argue that it would be
> equally easy and/or difficult to achieve one as well as the other since
> they both have the same size and individual specificities of their
> individual parts.

No, I am trying to argue that it is absurd to claim that some general
rule guarantees that the proportion of possible 548aa-long peptides
functioning as 4M5NC monoxygenase is the same as the proportion of
possible 1,888aa-long sequences that degrade 2,4-DNT into pyruvate.

> What you fail to realize is that the additional
> specificity requirement, that each part be specifically oriented
> relative to every other part in the system, dramatically increases the
> overall specificity of a system - making it much much much more
> difficult to realize in a given period of time (compared to a cascading
> system).

If that is so, then the appropriate way to model this is to give lower
proportions of suitable sequences in the minimum sequence space, not to
jetison your own stated criteria just for cascades.

Richard Forrest

unread,

Jul 27, 2006, 12:30:56 PM7/27/06

to

Seanpit wrote:
> Richard Forrest wrote:
> > Seanpit wrote:
> > > Richard Forrest wrote:
> > > > Seanpit wrote:
> > > > <snipped>
> > > > >

> > > > > The probabilities of two independent events occurring is the sum of the
> > > > > probabilities of each one occurring - not the product.
> > > > >
> > > >

> > > > For crying out loud, Sean, learn some basic mathematics.
> > > >
> > > > You are making an argument based on probabilities, yet are so freaking
> > > > ignorant about the mathematics of probability that you make an
> > > > assertion as stupid as this?
> > > > Get real.
> > >
> > > Oh really? If I am so far off base, Richard, please do tell me how
> > > many picks I'd have to make, on average, before I draw out the one
> > > black marble from 1 million marbles two times? Or, how many times would
> > > I have to play the California lottery to win it twice with odds of 1 in
> > > 250 million?
> >

> > That's not what you were asserting.
>
> Yes, it is what I am asserting. What did you think I was asserting?
>
> > Learn some basic statistics.
>
> Learn how to apply basic statistics . . .

I have and I do.

See my forthcoming papers on a Bayesian approach to analysing the
distribution of bite marks on plesiosaur limb bones, and on the
phylogentic implications of the results of principal component analysis
of measurements of plesiosaur vertebral centra.

It's a central part of my PhD thesis, so I took the time to make sure
that I have a good understanding of the principles of statistics, and
have discussed my approach with statisticians before embarking on
extensive data collection.

When you make a statement such as "The probabilities of two independent
events occurring is the sum of the probabilities of each one occurring
- not the product" it reveals such a basic lack of understanding of
statistics that the mind boggles that you even think that you can use
statistics in any meaningful way.

So I suggest that you go away and learn something about the subject to
avoid making a fool of yourself again.

RF

>
> >
> > RF
> >
> > >
> > > >
> > > > RF
> > > >
> > > > <snipped>

Josh Hayes

unread,

Jul 27, 2006, 2:25:55 PM7/27/06

to

"Richard Forrest" <ric...@plesiosaur.com> wrote in

news:1154017856....@75g2000cwc.googlegroups.com:

> See my forthcoming papers on a Bayesian approach to analysing the
> distribution of bite marks on plesiosaur limb bones, and on the
> phylogentic implications of the results of principal component
> analysis of measurements of plesiosaur vertebral centra.

I'm not a Bayesian kind of guy, so I won't comment on that, but when I
was at an ESA (Ecological Society of America) meeting back in,
um...1987? It was at UC Davis -- this grad student gave a perfectly nice
talk involving using PCA, and at the end of the statistical portion he
added, "and these results are significant at p < 0.01".

You could pick out the statisticians in the audience, because we all
blinked and looked around the room to see if anyone else had blinked and
was looking around the room.

(The joke, see, is that PCA is an exploratory technique involving
rotating multivariate data around assumed orthogonal axes in n-
dimensional space where n is the number of variates -- there IS no "null
hypothesis" against which to estimate. It's just useful for squishing
many many variables, which are impossible to visualize, down to, say,
three, with most of the interesting variability preserved, which CAN be
visualized. End of lecture.)

I suppose one could come up with some distribution tables based on
bootstrapping, but it seems like a lot of work for a not very robust
result.

That said, I'm a pretty fair statistician and when I have the patience
to wade through any of Sean's stuff it's instantly obvious he, well --
isn't. Just to bring this semi-on-topic.

-JAH

Richard Forrest

unread,

Jul 27, 2006, 5:09:01 PM7/27/06

to

Josh Hayes wrote:
> "Richard Forrest" <ric...@plesiosaur.com> wrote in
> news:1154017856....@75g2000cwc.googlegroups.com:
>
> > See my forthcoming papers on a Bayesian approach to analysing the
> > distribution of bite marks on plesiosaur limb bones, and on the
> > phylogentic implications of the results of principal component
> > analysis of measurements of plesiosaur vertebral centra.
>
> I'm not a Bayesian kind of guy, so I won't comment on that, but when I
> was at an ESA (Ecological Society of America) meeting back in,
> um...1987? It was at UC Davis -- this grad student gave a perfectly nice
> talk involving using PCA, and at the end of the statistical portion he
> added, "and these results are significant at p < 0.01".
>
> You could pick out the statisticians in the audience, because we all
> blinked and looked around the room to see if anyone else had blinked and
> was looking around the room.

I'd have been blinking along with you!

There are too many occasions on which confident conclusions are stated
on a very weak basis of evidence.

>
> (The joke, see, is that PCA is an exploratory technique involving
> rotating multivariate data around assumed orthogonal axes in n-
> dimensional space where n is the number of variates -- there IS no "null
> hypothesis" against which to estimate. It's just useful for squishing
> many many variables, which are impossible to visualize, down to, say,
> three, with most of the interesting variability preserved, which CAN be
> visualized. End of lecture.)
>
> I suppose one could come up with some distribution tables based on
> bootstrapping, but it seems like a lot of work for a not very robust
> result.

I'd regard that as a complete and utter waste of time. I'm very dubious
about the value of bootstrapping methods in most areas of mathematical
analysis. They may give clear results, but the likelihood of them being
a load of bollocks is high.

It would be better in many cases if researchers were to conclude that
the data does not give any conclusive result, but the pressure to
produce clear-cut conclusions acts against that.

>
> That said, I'm a pretty fair statistician and when I have the patience
> to wade through any of Sean's stuff it's instantly obvious he, well --
> isn't. Just to bring this semi-on-topic.
>

I was fortunate that a friend of mine, and a neighbour at the time I
started serious research into plesiosaurs was Professor of Statistics
at Nottingham University, and the author of one of the standard
textbools on Bayesian statistics. He was very helpful in pointing me in
the right direction regarding the use of statistics at an early stage
of my development as a vertebrate palaeontologist.

RF

> -JAH

Von R. Smith

unread,

Jul 27, 2006, 5:42:42 PM7/27/06

to

When we talk about concepts we don't understand, we tend to choose our
words poorly.

> I
> knew what I was thinking, but the words obviously didn't come out
> right. The basic concept though, the concept I described to you in the
> beginning, is correct. The mean average number of trials needed to
> gain an entire cascading system is about the same as the mean number
> needed to gain the single most complex part of that system.
>
> You are the one who is basically trying to argue that finding 4 or 5
> short sequences is just as hard as finding one long sequence of
> equivalent total length. This doesn't seem to be true.

I'm not arguing anything of the sort. We already know that the 2,4-DNT
cascade evolved in no more than a few decades. The question we are
discussing is how many "fairly specified" base pairs are required to
code for it. You have already explained to me "over and over again",
how one does this: it has something to do with the minimum size
requirement, and the degree of constraint on that minimally-sized
sequence if it is to perform a given function.

Now, I know that you really really want your number of "fairly
specified" base pairs (or amino acids, or whatever) to also say
something about how long it would take a structure serving some
function to evolve; I know that you really really want your audience to
believe that this number says something about the size of "neutral
gaps". But your desired conclusion doesn't follow from your stated
premises.

>
> For example, lets say that a particular cascading system requires 5
> absolutely specified residues. Let's say that the sizes of each is:
> 150aa, 200aa, 300aa, 350aa, and 500aa. How many searches, on mean
> average, would it take to find all of them?
>
> (1/1e-195) + (1/1e-260) + (1/1e-390) + (1/1e-455) + (1/1e-650) = ~1e650
> searches.
>
> Now, compare this with the mean average number of searches it would
> take to find just one sequence of 1500aa (1/1e-1951) = 1e1951. Not
> even remotely comparable - right?

Congratulations! You have just shown that your strawman version of
evolution (which you repeatedly claim not to be using, but which always
slips right back into your argument, as it just did above) doesn't look
anything like the way it actually works.

>
> Are you starting to see the problem here? As I said originally, the
> mean average time needed to find all the enzymes in an enzymatic
> cascade is basically the same as the average time needed to find the
> single most complex enzyme in the cascade. You argued that the time
> needed would be pretty much the same as that needed to find all the
> proteins in a system requiring the same total minimum number of
> residues even regardless of the fact that this system also requires
> each protein part to be specifically specified relative to all the
> other protein parts in that system.

Again, we are not discussing time. We are discussing the "number of
fairly specified" base pairs as measured by your own proposed method
for measuring that number, which is to estimate a "minimum size
requirement" and then a proportion of sequences of said minimum size
that perform the function in question. You are the only person in the
discussion who also thinks that this number, whether accurate or not,
tells us anything meaningful about how much time it takes something to
evolve.

>
> As you can see, this simply isn't true. A system that does not require
> overall specificity of arrangement of all its parts is not anywhere
> near the degree of specificity of one that does.
> This translates into
> a marked increase in the amount of average time needed to evolve the
> system that has the overall specificity requirement compared to the one
> that does not - even given the same minimum size requirement.

Specificity sensu Pitman, as you have told me "over and over again" is
basically the proportion of sequences of a given minimum size that code
for or perform a particular function. I didn't see any estimates of
that proportion in your discussion above, (indeed, the concept appears
to have vanished from your argument altogether) so I don' see what
warrants your assertions about specificity here.

>
> Sean Pitman
> www.DetectingDesign.com

John Wilkins

unread,

Jul 27, 2006, 9:48:56 PM7/27/06

to

Richard Forrest <ric...@plesiosaur.com> wrote:

> and the author of one of the standard
> textbools on Bayesian statistics

Is that like the priors of a leetle white bool?
--
John S. Wilkins, Postdoctoral Research Fellow, Biohumanities Project
University of Queensland - Blog: scienceblogs.com/evolvingthoughts
"He used... sarcasm. He knew all the tricks, dramatic irony, metaphor,
bathos, puns, parody, litotes and... satire. He was vicious."

Richard Forrest

unread,

Jul 28, 2006, 6:32:54 AM7/28/06

to

John Wilkins wrote:
> Richard Forrest <ric...@plesiosaur.com> wrote:
>
> > and the author of one of the standard
> > textbools on Bayesian statistics
>
> Is that like the priors of a leetle white bool?

No, just a load of boolocks like most mathematics...

Seanpit

unread,

Jul 28, 2006, 10:08:06 AM7/28/06

to

That's right . . .

> I haven't read the entire thread, but I assume you are arguing that a
> cascade is a ratchet that allows the first black marble to be locked into
> place, then in the next 1e6 draws (as it were) would be expected to produce
> the second marble ('sum').

That's basically correct . . . This characteristic is a feature of low
overall specificity of a system compared to an equally sized system
were every part must be specifically arranged relative to every other
part. The time to achieve success for a system of lower specificity is
dramatically reduced - exponentially. For a cascading system, the time
is reduced, given a certain number of searchers, to about the time
needed to gain the most complex subpart of the system. Von still
doesn't seem to understand that this is all about the average time
involved - the average time needed to find, via random walk or random
selection, all then needed aspects of a system.

> And then argue that evolution requires two
> directly consecutive (in effect simultaneous)
> successful draws to cross the
> neutral gap ('product').

That is not my argument at all. Evolution does not require consecutive
events to achieve success. Not at all. However, evolution does
require that the "correct" parts that were there before are still there
by the time the last needed part is realized. This becomes
exponentially more difficult to do if every part in a system must be
specifically arranged relative to every other part. For a functional
system of a given size, reducing this specificity requirement, even a
little bit, dramatically reduces the mean time needed to find all the
needed parts of that system in sequence space.

> Ironically, the ratchet is much like how evolution in general is expected to
> work. Each small change adding some selectable value.

Sure. That's how it works when it works. However, for functions that
require longer minimum sizes and greater overall specificity before
they will work at all, the subfunctions become more and more widely
spaced until what must be added to achieve the next steppingstone
function creates a wider and wider linear gap - which translates into
an exponentially longer random walk or number of random selection
events.

> By the way, Behe and Snoke showed that reasonable so-called neutral gaps are
> no problem for a population of prokaryotes to cross in short order just with
> point-mutations, even making some very conservative assumptions. And they
> admit that other mechanisms (e.g. homologous recombination) may provide much
> faster changes.
> http://www.proteinscience.org/cgi/reprint/ps.04802904v1.pdf

Of course "reasonable" neutral gaps aren't a problem. It's the
"unreasonable" neutral or non-beneficial gaps that become problematic
for evolution. At higher and higher levels of minimum functional
complexity these gaps grow linearly and the mean time required to cross
these gaps grows exponentially.

> Meanwhile, Bogarad & Deem say that "DNA base substitution, in the context of
> the genetic code, is ideally suited for the generation, diversification, and
> optimization of local protein space",

Sure, at very low levels of functional complexity. The problem is,
this sort of thing just doesn't help beyond very low levels. Novel
functions requiring 3-4kbps of fairly specified DNA, at minimum, just
don't evolve. That is why Von was trying to argue that cascading
functions that do evolve require many thousands of bp of genetic real
estate. The problem with cascading functions is that they simply are
not very specified. They are not specified more than the specificity of
the most specified single subpart of the system.

> and then provide empirical evidence
> and an explanatory model of still other natural mechanisms such "that
> organization into higher-order fundamental units such as nucleic acids, the
> genetic code, secondary and tertiary structure, cellular
> compartmentalization, cell types, and germ layers allows systems to escape
> complexity barriers and potentiates explosions in diversity".
> http://www.pnas.org/cgi/content/full/96/6/2591

None of these stories about how evolution is supposed to create
high-level functions has ever been demonstrated in real life or even on
paper in a remotely tenable way. These notions are just as mixed up as
yours are when you promote your word evolution programs as actually
modeling random mutations (the rate of which is much different for
different kinds of mutations) and selection based on a gained
beneficial meaning or function. With these limitations in place, the
ratio of beneficial vs. non-beneficial does indeed dramatically
decrease with each step up the ladder of minimum size and specificity
requirements. This dramatic decrease in ratio creates linearly
increasing gaps sizes between what is and the next closest potentially
beneficial sequence at the same level or greater. And, a linear
increase in the gap size translates into an exponential increase in the
mean time before such a sequence will be found via any sort of random
search.

> Also, keep in mind that ignorance is not evidence, and we can't make a
> scientific assertion of an Intelligent Designer every time there is a Gap in
> human knowledge.

Scientific method is fundamentally based on ignorance as evidence. The
lack of evidence that counters a theory, a theory, which is actually
open to falsification, is part of the value of a useful theory. In
other words, the lack of negative evidence that goes counter to a
theory helps to increases the predictive value of the theory.

> Zachriel's
> Word Mutation and Evolution Experiment
> And it takes less than "zillions of years"!
> http://www.zachriel.com/mutagenation/

Sean Pitman
www.DetectingDesign.com

Seanpit

unread,

Jul 28, 2006, 10:11:28 AM7/28/06

to

Gerry Murphy wrote:
> "Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message
> news:1153882557.5...@i42g2000cwa.googlegroups.com...
> >
> > Gerry Murphy wrote:
> > > "Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message
> > > news:1153788716.3...@p79g2000cwp.googlegroups.com...
> > > >
> > > <snip>
> > >
> > > > What is the mean average number of draws I'd have to make?
> > >
> > > Where the hell do you get this nomenclature, "mean average"?
> > > Your using the same textbook as nando, aren't you?
> >
> > There are different types of averages - right? The mean, median, and
> > mode?
> >
>
> Yes, and that's how we describe them. Tacking on 'average' adds nothing.

I was doing this for emphasis since many who posted in this thread
didn't seem to realize that there are different kinds of "averages" and
that the numbers I was using were indeed correct given that I was
talking about the "mean". I thought that just using the term "average"
or "mean" by itself would be good enough - but evidently not.

Seanpit

unread,

Jul 28, 2006, 10:46:40 AM7/28/06

to

Von R. Smith wrote:
> Seanpit wrote:
> > Von R. Smith wrote:
> >
> > < snip >
> >
> > > > > The probability of two independent
> > > > > events occuring is the sum of the
> > > > > probabilites of each occuring.
> > > >
> > > > Again, the mean average number of equally probable trials before
> > > > success is 1/prob of success. And, the mean average number of trials
> > > > before success for 2 successes is (1/pr success) + (1/pr success).
> > >
> > > Which directly contradicts your claim about summing the probabilities
> > > of the individual successes.
> >
> > That's true. But, it also contradicts your claim that cascading systems
> > of function have the same degree of specificity as functions that
> > require each individual subpart to be specifically oriented relative to
> > every other part in the system (i.e., a fully specified system of
> > function) - which is the main point of this discussion.
>
> How does it contradict this, Sean? Your metric of specification (as
> you have told me "over and over again") is given by a minimum size
> requirement,

Size has nothing to do with "specificity" or the degree that parts
(residues in this case) must be specifically arranged relative to each
other.

> plus some measurement of sequence constraint, which you
> usually give as the proportion of sequences of said minimum size
> serving the function in question.

The minimum size is the absolute number of residues needed to perform a
specific function. In other words, when no protein below a certain
size could give rise to a certain function, that is the minimum size
requirement for that function. However, this has nothing to do with the
minimum degree of specificity required by that function. The protein
that requires a minimum size of 100aa may be quite flexible as far as
what residue fills a given position or even most positions.

Some proteins are much more flexible in the limitations of what
variations in sequence and overall 3D appearance can be tolerated
before the function in question is completely lost. The more limited
the degree of sequence and tertiary structure variability, the greater
the minimum specificity of that function.

For example, the function of cytochrome c (CC) carries with it a
minimum size requirement of about 80aa as well as a fairly high degree
of sequence specificity - i.e., little toleration for overall
variation.

These limitations produce an overall ratio of sequences in sequence
space of a given size that could produce a given function. It is this
ratio that is important to this discussion here. A cascading system of
function has a relatively low ratio in that all the parts of a
cascading system of function will fit into a much much smaller sequence
space compared to a function that requires each of its subparts to be
specifically oriented relative to all the other parts in the system.

> I don't see where expected successes
> in a number of Bernoulli trials even
> figure into the picture. More on
> this below.

This is what the entire discussion is about - the average number of
random searches before success is realized. How long will evolution
take, on average? That is the main question here. In my theory, the
answer to this question is closely tied to the ratio of sequences in
sequence space that could produce a particular type of function. The
ratio for cascading systems is much higher than it is for systems that
require more of their subparts to be oriented/specified with the other
parts in the system.

> > A cascading

> > system is therefore exponentially
> > easier to evolve than a fully
> > specified system.
>
> Let's test this notion against what you have told me elsewhere. You
> have explained to me "over and over again" that specificity is given by
> two measurements:

Go back and read what I've actually said. What I said, over and over
again, was that functional complexity is measured by the minimum size
and specificity of a system.

> a) A minimum size requirement.

One of the components of function complexity.

> b) A measurement of "specificity", which you typically estimate as the
> proportion of all possible sequences of aforementioned minimum size
> performing the function in question.

This is a definition of the minimum size requirement - not specificity.
Specificity is a measure of the minimum degree of variability of a
sequence before the function in question is lost. These are very
different concepts.

The rest of your post is based on this misconception and should
therefore be cleared up by a correct understanding of the difference
between the definitions of minimum size and specificity as they
collectively define functional complexity.

If these numbers did in fact represent the minimum size for each of the
enzymes in this cascade, and even if each of these enzymes were
absolutely specified in their sequence requirements, it wouldn't
matter. The overall functional complexity of the entire cascade
wouldn't be much more than the largest enzyme in the cascade (given the
same degree of absolute specificity for each part). If you want to
argue that a smaller enzyme, in reality, has the greatest specificity,
that's fine. But, it has a smaller minimum size requirement so the
overall degree of functional complexity of the system wouldn't be very
significant - not even close to 3 or 4 kbps of DNA coding for a
functional system that has a fair degree of minimum specificity.

> > As I said originally, the
> > mean average time needed to find all the enzymes in an enzymatic
> > cascade is basically the same as the average time needed to find the
> > single most complex enzyme in the cascade. You argued that the time
> > needed would be pretty much the same as that needed to find all the
> > proteins in a system requiring the same total minimum number of
> > residues even regardless of the fact that this system also requires
> > each protein part to be specifically specified relative to all the
> > other protein parts in that system.
>
>
> I am not talking about time at all. I am talking about a supposedly
> synchronic property: the number of "fairly specified" residues in a
> particular cascade, and the number of "fairly specified" base pairs in
> the genetic sequences that code for its enzymes. Whether that number,
> once calculated, will also say anything meaningful about how long it
> takes a particular function to evolve is one of those open issues that
> is ever in dispute in our threads.

What we are actually talking about here is the ratio of sequences that
could code for a particular system of beneficial function in sequence
space vs. the number of potential sequences that could not. A
cascading system of function has a much much higher ratio compared to a
system that requires more of its parts to be specifically oriented with
each other.

> > As you can see, this simply isn't true. A system that does not require
> > overall specificity of arrangement of all its parts is not anywhere
> > near the degree of specificity of one that does.
>
>
>
> The specificity, according to your own rules, is measured by the
> proportion of sequences of a hypothetical minimum length that actually
> code for/perform the function in question.

Not true - this is the definition of the minimum size requirement - not
specificity.

> It is not a measurement of
> time. Whether a function of a given specificity could evolve in a
> given amount of time was precisely one of the points at issue. You
> don't get to rig the discussion by redefining your claims to make them
> tautologous.

I've defined these terms over and over again for you. You are just
mixing up definitions that have been in place for some time now. I'm
not sure if you are doing this deliberately or not?

If you go back and look at the original posts between us, you will see
that this is exactly what I originally presented.

> > > > What I said was the average number of
> > > > attempts needed to achieve success, the mean average, is the sum of the
> > > > number of attempts needed to achieve success for each of the parts of
> > > > your protein system.
> > >
> > > > Relatively speaking this total number is very
> > > > close to the number of attempts needed to achieve success for the most
> > > > complex subpart in your cascading system - and nowhere near the number
> > > > of attempts needed, on mean average, to find a sequence that could give
> > > > rise to a system of function were each subpart must be specifically
> > > > oriented relative to all the other subparts in space.
> > >
> > > That follows if and only if "two independent events" = "two occurences
> > > of the same event in a series of trials". Which you insist you never
> > > said.
> >
> > I don't understand this to be true. It seems to me that this "follows"
> > regardless of if "two independent events" = "two occurrences of the
> > same event in a series of trials" or if the "two independent events"
> > are not the same and have different odds of success.
>
> If two processes are not the same, then there is no justification for
> treating them mathematically as if they were the same.

You don't treat them as if they were the same. However, you can still
calculate the number of mean trials before all are realized by adding
the inverse of the odds of success of each individual event. It
doesn't matter of the odds of success are different. The overall
number of trials (the mean average) before all are realized can still
be calculated in this manner.

Sean Pitman
www.DetectingDesign.com

Seanpit

unread,

Jul 28, 2006, 10:54:34 AM7/28/06

to

Von R. Smith wrote:

> > You are the one who is basically trying to argue that finding 4 or 5
> > short sequences is just as hard as finding one long sequence of
> > equivalent total length. This doesn't seem to be true.
>
> I'm not arguing anything of the sort. We already know that the 2,4-DNT
> cascade evolved in no more than a few decades. The question we are
> discussing is how many "fairly specified" base pairs are required to
> code for it. You have already explained to me "over and over again",
> how one does this: it has something to do with the minimum size
> requirement, and the degree of constraint on that minimally-sized
> sequence if it is to perform a given function.

Minimum size is not part of the definition of specificity. It is
rather part of the definition of functional complexity. I will repost
a prior response to a post of yours where your arguments are very
similar to the ones you provide in this post:

____________

> > For example, lets say that a particular cascading system requires 5
> > absolutely specified residues. Let's say that the sizes of each is:
> > 150aa, 200aa, 300aa, 350aa, and 500aa. How many searches, on mean
> > average, would it take to find all of them?
> >
> > (1/1e-195) + (1/1e-260) + (1/1e-390) + (1/1e-455) + (1/1e-650) = ~1e650
> > searches.
> >

> > Is this equating apples and oranges?
> >

> > Now, compare this with the mean average number of searches it would
> > take to find just one sequence of 1500aa (1/1e-1951) = 1e1951. Not
> > even remotely comparable - right?
> >

> > Are you starting to see the problem here?
>
>

> > As I said originally, the

> > mean average time needed to find all the enzymes in an enzymatic
> > cascade is basically the same as the average time needed to find the
> > single most complex enzyme in the cascade. You argued that the time
> > needed would be pretty much the same as that needed to find all the
> > proteins in a system requiring the same total minimum number of
> > residues even regardless of the fact that this system also requires
> > each protein part to be specifically specified relative to all the
> > other protein parts in that system.
>
>

> I am not talking about time at all. I am talking about a supposedly
> synchronic property: the number of "fairly specified" residues in a
> particular cascade, and the number of "fairly specified" base pairs in
> the genetic sequences that code for its enzymes. Whether that number,
> once calculated, will also say anything meaningful about how long it
> takes a particular function to evolve is one of those open issues that
> is ever in dispute in our threads.

What we are actually talking about here is the ratio of sequences that
could code for a particular system of beneficial function in sequence
space vs. the number of potential sequences that could not. A
cascading system of function has a much much higher ratio compared to a
system that requires more of its parts to be specifically oriented with
each other.

> > As you can see, this simply isn't true. A system that does not require

> > overall specificity of arrangement of all its parts is not anywhere
> > near the degree of specificity of one that does.
>
>
>

> The specificity, according to your own rules, is measured by the
> proportion of sequences of a hypothetical minimum length that actually
> code for/perform the function in question.

Not true - this is the definition of the minimum size requirement - not
specificity.

> It is not a measurement of
> time. Whether a function of a given specificity could evolve in a
> given amount of time was precisely one of the points at issue. You
> don't get to rig the discussion by redefining your claims to make them
> tautologous.

I've defined these terms over and over again for you. You are just
mixing up definitions that have been in place for some time now. I'm
not sure if you are doing this deliberately or not?

> > This translates into

> > a marked increase in the amount of average time needed to evolve the
> > system that has the overall specificity requirement compared to the one
> > that does not - even given the same minimum size requirement.
>
>

Seanpit

unread,

Jul 28, 2006, 10:50:43 AM7/28/06

to

The minimum size requirement has nothing to with measuring specificity.
It is an independent measurement. It, with specificity, define
functional complexity - as I describe in detail in the below pasted
portion of a post to another very similar response of yours:

______________

require more of their subparts to be oriented/specified with the other
parts in the system.

> > A cascading

> > For example, lets say that a particular cascading system requires 5
> > absolutely specified residues. Let's say that the sizes of each is:
> > 150aa, 200aa, 300aa, 350aa, and 500aa. How many searches, on mean
> > average, would it take to find all of them?
> >
> > (1/1e-195) + (1/1e-260) + (1/1e-390) + (1/1e-455) + (1/1e-650) = ~1e650
> > searches.
> >

> > Is this equating apples and oranges?
> >

> > Now, compare this with the mean average number of searches it would
> > take to find just one sequence of 1500aa (1/1e-1951) = 1e1951. Not
> > even remotely comparable - right?
> >

Augray

unread,

Jul 28, 2006, 12:38:32 PM7/28/06

to

On 25 Jul 2006 19:15:29 -0700, "Seanpit"
<seanpi...@naturalselection.0catch.com> wrote in
<1153880129.0...@i3g2000cwc.googlegroups.com> :

So why do you keep using the term "mean average"?

>The mode, in this case, is 1, the median just
>shy of 4, and the mean is 6. Look it up. And, if you don't believe
>me, do the experiment yourself or, better yet, get snex to explain it
>to you.
>
>> >> >> >Now, how many
>> >> >> >times would you have to roll the dice, on average, to get "6" twice?
>> >> >> >You'd have to roll the dice 12 times. That's right, only 12 times.
>> >> >>
>> >> >> No, that's not true either.
>> >> >
>> >> >What's the mean average then?
>> >>
>> >> I suggest you actually do it, and find out.
>
>The answer is 12. That is the mean average number of dice throws it
>would take to get six twice.

Run Data Count
--------------------------------------------------------
1 - 6,1,6 3
2 - 1,3,4,4,5,5,6,2,5,4,5,1,2,5,3,6 16
3 - 6,1,3,1,3,6 6
4 - 3,3,5,3,6,4,2,5,6 9
5 - 4,1,6,6 4
6 - 3,6,6 3
7 - 4,5,3,1,1,5,5,1,3,6,1,5,5,4,1,2,5,4,6 19
8 - 3,2,3,3,3,1,2,4,3,6,2,4,5,3,4,2,4,6 18
9 - 4,4,6,4,5,1,2,6 8
10 - 3,2,6,3,2,2,1,2,2,6 10
11 - 4,4,1,4,3,1,4,4,4,2,3,3,3,2,1,2,5,5,5,6,1,2,6 23
12 - 1,5,2,6,3,3,5,1,5,3,3,3,2,5,1,5,6 17
13 - 6,1,5,6 4
14 - 2,6,2,2,6 5
15 - 6,3,6 3
16 - 6,2,4,3,4,1,1,1,4,2,6 11
17 - 5,2,4,2,1,4,5,6,2,1,6 11
18 - 2,3,6,3,2,5,6 7
19 - 2,3,5,6,5,4,5,4,5,3,3,4,5,2,1,3,6 17
20 - 1,6,4,1,6 5
21 - 1,5,4,2,5,2,2,3,3,2,6,1,5,3,6 15
22 - 1,6,2,6 4
23 - 2,5,4,6,3,6 6
24 - 2,1,6,2,3,1,4,5,4,6 10
25 - 6,2,1,6 4
========================================================
Total Throws = 238
Mean = 9.25
Median = 8
Mode = 4

snex

unread,

Jul 28, 2006, 1:06:17 PM7/28/06

to

your data set isnt large enough. with 10,000 trials i have mean = 12
and median = 10.

snex

unread,

Jul 28, 2006, 1:25:18 PM7/28/06

to

also, the mode keeps fluctuating between 6 and 9. when i set trials to
100,000, the mode is either 6 or 7.

Nic

unread,

Jul 28, 2006, 1:48:33 PM7/28/06

to

Seanpit wrote:
> snex wrote:
> > Seanpit wrote:
> > > snex wrote:
> > > > Seanpit wrote:
> > > > > snex wrote:
> > > > > > snex wrote:

> > > > > > > Seanpit wrote:
> > > > > > > > Gerry Murphy wrote:

> > > > > > > > > "Seanpit" <seanpi...@naturalselection.0catch.com> manifested his
> > > > > > > > > innumeracy in message
> > > > > > > > > news:1153754778.5...@i42g2000cwa.googlegroups.com...
> > > > > > > > >
> > > > > > > > > <snip>
> > > > > > > > > > The odds that they will occur in a given set of a sum of their
> > > > > > > > > > independent odds is 50:50. For example, how many times would you have

> > > > > > > > > > to role a dice to get the side with six dots on it? You'd have to roll
> > > > > > > > > > the dice 6 times, on average.
> > > > > > > > >

> > > > > > > > > No, you have to roll it just under 4 times, on average.
> > > > > > > > >
> > > > > > > > > You really should stop embarrassing yourself with this display of
> > > > > > > > > breathtaking ignorance.
> > > > > > > >
> > > > > > > > What is the mean average number of rolls?
> > > > > > >
> > > > > > > the probability that you will rolls 6 after N trials can be calculated
> > > > > > > by the following recursive formula:
> > > > > > >
> > > > > > > P(N) = (5/6)^(N-1) * P(N-1)
> > > > > >
> > > > > > correction: P(N) = (1/6)*((5/6)^(N-1)) + P(N-1)
> > > > > >
> > > > > > i hate copying math from paper where it looks nice into a computer
> > > > > > where i have to make it conform to ascii. :/
> > > > > >
> > > > > > >
> > > > > > > where P(1) = 1/6.
> > > > > > >
> > > > > > > by solving for 0.5 (the point at which 50% of the time you would have
> > > > > > > rolled a 6), you get slightly under 4, as stated by gerry.
> > > > >
> > > > > Isn't that the median average rather than the mean?
> > > >
> > > > no, and i dont even need to delve into the mathematics to explain why.
> > > >
> > > > first of all, there is no median, because the domain is infinite.
> > > > medians only exist when there are a finite number of trials, but when
> > > > determining how many rolls it takes to get a 6, you could theoretically
> > > > roll forever and never get one.
> > >
> > > I agree that the median average is around 4, but I'm still not sure
> > > about the mean average - because of the skewed nature of the curve. It
> > > seems to me that the mean wouldn't be the same as the median in this
> > > case. It seems to me like it should be 6.

> > >
> > > "The theoretical distribution for large numbers of throws of unbiased

> > > dice has: mode = 1 median = 4 (The mean, not calculated here, is 6.)"
> > >
> > > http://www.rsscse.org.uk/pose/level1/book5/notes.htm
> > >
> > > > secondly, the domain consists only of integers, and the average is
> > > > ~3.80178402, which is not an integer. the only average of an integer
> > > > set that can be a non-integer is the mean.
> > > >
> > > > if you want to delve into the math, youll have to convert my recursive
> > > > formula into a geometric series, and then use the mean value theorem
> > > > integral. my bet is that youll get ~3.80178402.
> > > >
> > > > if you still arent convinced, i can write a program that lets you
> > > > choose the number of trials and tells you the mean, median, and mode
> > > > for the number of rolls required to roll a 6.
> > >
> > > That would be great.
> >
> > eh, well the program i wrote seems to be agreeing with you. mean keeps
> > coming up 6 and median 4 (when n-trials is large).
>
> Hmmmm . . . So, I'm not crazy after all? ; )

Unfortunately Sean, this dice question is one that all normal numerate
people *get wrong* when they first encounter it. They take a lot of
convincing that the answer is really 6.

They reason correctly that the mean number of throws between successive
sixes is 6. They then reason *incorrectly* that each time you start a
new trial, you must on average come in half-way along such a gap, there
you for should be due to get a six on average after only 3 throws.

> >
> > >
> > > >
> > > > >
> > > > > > > > Sean Pitman
> > > > > > > > www.DetectingDesign.com

Nic

unread,

Jul 28, 2006, 2:01:26 PM7/28/06

to

Augray wrote:
> On 25 Jul 2006 19:15:29 -0700, "Seanpit"

<snip>

Strange - out of 238 throws,
6 has come up 50 times, and
1 has come up 34 times.

I can't work out if that is a likely outcome for that number of throws,
but intuitively it ain't likely and so I'd mistrust your dice!

<snip>

Von R. Smith

unread,

Jul 28, 2006, 4:46:16 PM7/28/06

to

You will note that, in the text above, I did not say that it did. In
one post I made the mistake of using the term "specificity" to refer to
what in your terminology would actually be called "functional
complexity". I was more careful in the current post. Since your
response consists in large part of jumping up and down on that mistake,
I am going to snip it as non-responsive here, and restore the rest of
my other text:

<snip>

>
> > You are trying to argue that it would be
> > equally easy and/or difficult to achieve one as well as the other since
> > they both have the same size and individual specificities of their
> > individual parts.
>
>
> No, I am trying to argue that it is absurd to claim that some general
> rule guarantees that the proportion of possible 548aa-long peptides
> functioning as 4M5NC monoxygenase is the same as the proportion of
> possible 1,888aa-long sequences that degrade 2,4-DNT into pyruvate.
>
>
> > What you fail to realize is that the additional
> > specificity requirement, that each part be specifically oriented
> > relative to every other part in the system, dramatically increases the
> > overall specificity of a system - making it much much much more
> > difficult to realize in a given period of time (compared to a cascading
> > system).
>
>
> If that is so, then the appropriate way to model this is to give lower
> proportions of suitable sequences in the minimum sequence space, not to
> jetison your own stated criteria just for cascades.
>
> <snip rest>

If you don't want to respond to any of the points made here because you
think doing so would be repetitious, fine. Don't.

Von R. Smith

unread,

Jul 28, 2006, 5:02:39 PM7/28/06

to

Seanpit wrote:
> Von R. Smith wrote:
>
> > > You are the one who is basically trying to argue that finding 4 or 5
> > > short sequences is just as hard as finding one long sequence of
> > > equivalent total length. This doesn't seem to be true.
> >
> > I'm not arguing anything of the sort. We already know that the 2,4-DNT
> > cascade evolved in no more than a few decades. The question we are
> > discussing is how many "fairly specified" base pairs are required to
> > code for it. You have already explained to me "over and over again",
> > how one does this: it has something to do with the minimum size
> > requirement, and the degree of constraint on that minimally-sized
> > sequence if it is to perform a given function.
>
> Minimum size is not part of the definition of specificity.

But it is part of your method for determining the number of "fairly
specified base pairs". You will note that I did not make the same
mistake in Pitman terminology here that I did in that other post, so
much of your reposted response is off-point for the message which I
will now repost below:

>
> Now, I know that you really really want your number of "fairly
> specified" base pairs (or amino acids, or whatever) to also say
> something about how long it would take a structure serving some
> function to evolve; I know that you really really want your audience to
> believe that this number says something about the size of "neutral
> gaps". But your desired conclusion doesn't follow from your stated
> premises.
>
>
> >

> > For example, lets say that a particular cascading system requires 5
> > absolutely specified residues. Let's say that the sizes of each is:
> > 150aa, 200aa, 300aa, 350aa, and 500aa. How many searches, on mean
> > average, would it take to find all of them?
> >
> > (1/1e-195) + (1/1e-260) + (1/1e-390) + (1/1e-455) + (1/1e-650) = ~1e650
> > searches.
> >

> > Now, compare this with the mean average number of searches it would
> > take to find just one sequence of 1500aa (1/1e-1951) = 1e1951. Not
> > even remotely comparable - right?
>
>

> Congratulations! You have just shown that your strawman version of
> evolution (which you repeatedly claim not to be using, but which always
> slips right back into your argument, as it just did above) doesn't look
> anything like the way it actually works.
>
> >

> > Are you starting to see the problem here? As I said originally, the

> > mean average time needed to find all the enzymes in an enzymatic
> > cascade is basically the same as the average time needed to find the
> > single most complex enzyme in the cascade. You argued that the time
> > needed would be pretty much the same as that needed to find all the
> > proteins in a system requiring the same total minimum number of
> > residues even regardless of the fact that this system also requires
> > each protein part to be specifically specified relative to all the
> > other protein parts in that system.
>
>

> Again, we are not discussing time. We are discussing the "number of
> fairly specified" base pairs as measured by your own proposed method
> for measuring that number, which is to estimate a "minimum size
> requirement" and then a proportion of sequences of said minimum size
> that perform the function in question. You are the only person in the
> discussion who also thinks that this number, whether accurate or not,
> tells us anything meaningful about how much time it takes something to
> evolve.
>
> >

> > As you can see, this simply isn't true. A system that does not require
> > overall specificity of arrangement of all its parts is not anywhere
> > near the degree of specificity of one that does.

> > This translates into
> > a marked increase in the amount of average time needed to evolve the
> > system that has the overall specificity requirement compared to the one
> > that does not - even given the same minimum size requirement.
>
>
>
>

> Specificity sensu Pitman, as you have told me "over and over again" is
> basically the proportion of sequences of a given minimum size that code
> for or perform a particular function. I didn't see any estimates of
> that proportion in your discussion above, (indeed, the concept appears
> to have vanished from your argument altogether) so I don' see what
> warrants your assertions about specificity here.
>

As you can see, there is no confusion here of specificity with minimum
size. If you still don't like my paraphrase of your concept of
specificity here, feel free to correct it. Now I understand that one
can quibble that counting the number of fairly-specified amino acids
properly consists of a bit more than just estimating the proportion of
functional sequences in sequence space, but that is how you yourself
typically estimate specificity; detailed descriptions of sequence
conservation are rare in your discussions that I have read.

Von R. Smith

unread,

Jul 28, 2006, 7:12:55 PM7/28/06

to

Seanpit wrote:
> Von R. Smith wrote:
> > Seanpit wrote:
> > > Von R. Smith wrote:
> > >
> > > < snip >
> > >
> > > > > > The probability of two independent
> > > > > > events occuring is the sum of the
> > > > > > probabilites of each occuring.
> > > > >
> > > > > Again, the mean average number of equally probable trials before
> > > > > success is 1/prob of success. And, the mean average number of trials
> > > > > before success for 2 successes is (1/pr success) + (1/pr success).
> > > >
> > > > Which directly contradicts your claim about summing the probabilities
> > > > of the individual successes.
> > >
> > > That's true. But, it also contradicts your claim that cascading systems
> > > of function have the same degree of specificity as functions that
> > > require each individual subpart to be specifically oriented relative to
> > > every other part in the system (i.e., a fully specified system of
> > > function) - which is the main point of this discussion.
> >
> > How does it contradict this, Sean? Your metric of specification (as
> > you have told me "over and over again") is given by a minimum size
> > requirement,
>
> Size has nothing to do with "specificity" or the degree that parts
> (residues in this case) must be specifically arranged relative to each
> other.

No, true; the Pitman term I was looking for was "functional
complexity". It is easy, when dealing with your idiosyncratic
terminology, to mistake the number of "fairly specified base-pairs" for
a measurement of specificity. Correcting this terminology point
doesn't really affect the main argument, though.

Great, so we can clear this whole "cascade rule" thing up right now.
All you have to do is answer the following questions:

1) What proportion of 548aa peptides function effectively as 4M5NC
monoxygenases?

2) What proportion of 1,888aa peptide sequences degrade 2,4-DNT?

3) How did you arrive at these numbers, and how could an independent
researcher verify them?

4) Are your answers to 1) and 2) the same number?

I added a "show your work" requirement, because I am not interested in
seeing you "assert" that 1) and 2) are the same number.

Let me give you a slight math hint:

The only way that 1) and 2) can be the same number is if the rest of
the cascade, apart from the largest enzyme, can be *any* possible
sequence in the sequence space, in other words, if dntB plus any random
peptide sequence "filling out" the minimum size requirement could
perform the function of the cascade.

But wait: actually, since we are discussing the number of "fairly
specified" *base pairs*, what you actually need is for any random
*nucleotide* sequence of roughly 4kb (including ones that don't code
for the appropriate number of peptides) plus a dntB gene, to be able to
produce an effective cascade. If this is not true, then the answers to
1) and 2) above cannot be the same number. And that means that the
specificity cannot be the same.

I can show you the simple algebra that demonstrates this, if you aren't
following.

If you actually apply your math, you will find that the proportion of
total sequence space performing the cascade function will be the
*product* of the proportions of the respective sequence spaces for all
four enzymes. So that if the proportion for each enzyme in its
respective sequence space is 1e-13, the proportion of sequences of the
total length of the cascade that actually perform all four steps will
be about 1e-52 (assuming that the overlap is random). Again, if you
don't trust my math, ask your math faculty at Loma Linda.

But wait, it is worse than that. Even if 1) and 2) *were* the same,
the cascade as a whole would *still* have a higher functional
complexity than just the largest enzyme in it, because, as you have
stated "over and over again":

A function that has greater minimum size requirements (given a constant
degree of specificity) with be at a higher level of functional
complexity.

Now, I don't think that you have ever explained just how you think
number of "fairly specified" base pairs varies as a function of minimum
size at a constant specificity, so perhaps now would be a good time to
do so. Is it linear? Exponential? Logarithmic? Quadratic?

This is enough to work on for now, so I will snip the rest. We can dig
up the other topics later.

Seanpit

unread,

Jul 28, 2006, 9:07:40 PM7/28/06

to

It doesn't matter, as far as your point is concerned, even if all the
residues were required to gain the function. Even if all of the
residues were required in each one of the enzymes in the 2,4-DNT
cascade, and, even if all of them were required to be absolutely
specified, the overall functional complexity of the system would not be
significantly greater than 548aa (i.e., the largest single-protein
enzyme in the cascade).

> 2) What proportion of 1,888aa peptide sequences degrade 2,4-DNT?

Again, it doesn't matter. Even if all of them were required, and, even
if all of them were required to be absolutely specified, it wouldn't
significantly affect the overall level of functional complexity beyond
the 548aa level.

For example, lets say that a system of function requires a minimum of 1
protein to work. This protein has to be at least 3aa in size and has
to be fully specified as far as the order of the resides within itself
in order to "work" at all. In other words, out of all the
possibilities in sequence space, only one 3aa sequence will be able to
do the job. What is the ratio of proteins with this particular
function in sequence space?

The ratio is 1 in 8000 (20^3).

Now, what if another system of function requires a minimum of two
proteins, each with the same size and specificity requirements as the
one above - just with two fully specified 3aa proteins instead of one?
What will the ratio be?

The sequence space is still make up of 8000 potential 3aa proteins.
What ratio of these proteins will work as at least one of the two
needed protein parts of the system in question? The ratio is 2 in 8000
or 1 in 4000.

The mean number of trails needed to find both is 8000 + 8000 = 16,000.

Now, what happens to the ratio of this two-protein system requires that
the proteins not only be internally specified, but specifically
oriented relative to each other - no other orientation will do? What
happens to the overall specificity of the system? It shoots up
dramatically. What does this do to the ratio of what will work vs.
what will not work?

The ratio drops dramatically to 1 in 64,000,000 (20^6).

> 3) How did you arrive at these numbers, and how could an independent
> researcher verify them?

Irrelevant to the point . . .

> 4) Are your answers to 1) and 2) the same number?

You don't seem to have an understanding about the difference between
minimum size and minimum specificity for a function in question.

> I added a "show your work" requirement, because I am not interested in
> seeing you "assert" that 1) and 2) are the same number.
>
> Let me give you a slight math hint:
>
> The only way that 1) and 2) can be the same number is if the rest of
> the cascade, apart from the largest enzyme, can be *any* possible
> sequence in the sequence space, in other words, if dntB plus any random
> peptide sequence "filling out" the minimum size requirement could
> perform the function of the cascade.

No true. Each enzyme could be completely specified so that only 1
enzyme in sequence space would do the job of that enzyme. Yet, if the
individual enzymes are not required to be in a specific orientation
with all the other enzymes in the system, the overall ratio of what
will work will be much greater.

> But wait: actually, since we are discussing the number of "fairly
> specified" *base pairs*, what you actually need is for any random
> *nucleotide* sequence of roughly 4kb (including ones that don't code
> for the appropriate number of peptides) plus a dntB gene, to be able to
> produce an effective cascade. If this is not true, then the answers to
> 1) and 2) above cannot be the same number. And that means that the
> specificity cannot be the same.
>
> I can show you the simple algebra that demonstrates this, if you aren't
> following.
>
> If you actually apply your math, you will find that the proportion of
> total sequence space performing the cascade function will be the
> *product* of the proportions of the respective sequence spaces for all
> four enzymes.

Not true. As explained above, the ratio of the sequences that can fill
the spot of at least one of the enzymes in the cascade is actually
higher than the ratio of a function that only requires the largest
enzyme (given the same degree of residue specificity). The number of
trials needed to find all of them is only the sum of the inverses of
each ratio.

> So that if the proportion for each enzyme in its
> respective sequence space is 1e-13, the proportion of sequences of the
> total length of the cascade that actually perform all four steps will
> be about 1e-52 (assuming that the overlap is random). Again, if you
> don't trust my math, ask your math faculty at Loma Linda.

Again, if the ratio for each of 4 enzymes, enzymes which are not
required to be specifically oriented with regard to each other in space
in order for the overall system/cascade to work, is 1e-13, the overall
ratio of the system of 4 enzymes will *not* be the product of 1e-13^4
or 1e-51. It will be 1e-13 / 4 = 2.5e-14. The number of mean number of
trials needed to gain all of them is the inverse of 1e-13, or 1e13,
times 4 - or 4e13 trials. This is a far cry from your suggested mean of
1e52 trials - which would be the mean if all the parts were fully
specified with regard to all the other parts.

> But wait, it is worse than that. Even if 1) and 2) *were* the same,
> the cascade as a whole would *still* have a higher functional
> complexity than just the largest enzyme in it, because, as you have
> stated "over and over again":
>
> A function that has greater minimum size requirements (given a constant
> degree of specificity) with be at a higher level of functional
> complexity.
>
> Now, I don't think that you have ever explained just how you think
> number of "fairly specified" base pairs varies as a function of minimum
> size at a constant specificity, so perhaps now would be a good time to
> do so. Is it linear? Exponential? Logarithmic? Quadratic?
>
> This is enough to work on for now, so I will snip the rest. We can dig
> up the other topics later.

If a system does not require its protein parts to be specifically
arranged with each other, the ratio of potential proteins in sequence
space that will be able to fit in at least one spot in that system
increases dramatically. Why? Because, a multiprotein system, which
requires specific orientation of its protein parts relative to each
other, is like a single protein where each individual residue is now
required to be in a specific orientation relative to a great many more
residues that it otherwise was as simply a part of a single protein.
Now, each individual residue is required to be specifically arranged
with hundreds of other residues as part of a larger system of function
that requires interprotein specificity of arrangement.

Compare this to the perspective of a single residue in a cascading
system where interprotein specificity of arrangement is not required.
That individual residue is only required to be specifically arranged
relative to the other residues in its own protein/enzyme - not to all
the other residues in all the other proteins.

Sean Pitman
www.DetectingDesign.com

Seanpit

unread,

Jul 28, 2006, 9:15:56 PM7/28/06

to

It was a simple slip up. What I should have said was that the mean
number of trails needed to achieve to independent events is the sum of
the inverse of the probabilities of each. This little mistake makes
little difference to the main point at hand in any case - which you
evidently have yet to appreciate.

> So I suggest that you go away and learn something about the subject to
> avoid making a fool of yourself again.

And I suggest you pay attention to the main point of the discussion
instead of getting hung up on minor tangents.

>
> RF
>
> >
> > >
> > > RF
> > >
> > > >
> > > > >
> > > > > RF
> > > > >
> > > > > <snipped>

Seanpit

unread,

Jul 28, 2006, 9:21:20 PM7/28/06

to

True . . .

>
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > > > Sean Pitman
> > > > > > > > > www.DetectingDesign.com

Richard Forrest

unread,

Jul 29, 2006, 4:16:22 AM7/29/06

to

It was a slip which revealed your fundamental lack of education in
statistics.
Nobody with any familiarity with statistical methods would have made
such a fundamental error.

This has been confirmed by your use of terms such as "mean average",
which are meaningless.

> What I should have said was that the mean
> number of trails needed to achieve to independent events is the sum of
> the inverse of the probabilities of each.

How the hell do you get from there to "The probabilities of two

independent events occurring is the sum of the probabilities of each

one occurring - not the product"?

> This little mistake makes
> little difference to the main point at hand in any case - which you
> evidently have yet to appreciate.

The main "point at hand" here is that you are proposing a statistical
argument against evolution, yet you are so evidently incompetent in
using statistics that you have had to be corrected with each statement
you make.

I don't understand why you don't go away and educate yourself on the
subject.

>
> > So I suggest that you go away and learn something about the subject to
> > avoid making a fool of yourself again.
>
> And I suggest you pay attention to the main point of the discussion
> instead of getting hung up on minor tangents.

The "main point of discussion" is that you are arguing in the basis of
a flawed statistical analysis of a model which which bears little
resemblance to that which any competent biologist would use that
evolution has limits.

The evidence from the real world shows no such limits.

If you want to form an argument, you need to demonstrate that the
interpretation of the evidence from the real world is flawed, and
provide a model which explains that evidence more robustly than the
models used by evolutionary scientists.

If you can't, all you are demonstrating is that your model is flawed -
which given your incompetence in basic statistics is hardly surprising.

RF

>
> >
> > RF
> >
> > >
> > > >
> > > > RF
> > > >
> > > > >
> > > > > >
> > > > > > RF
> > > > > >
> > > > > > <snipped>

Seanpit

unread,

Jul 29, 2006, 5:49:12 AM7/29/06

to

The "mean" is one kind of average. It is computed by summing the values
and dividing by the number of values. Two other common forms of
averages are the mode and the median. The mode is the most frequently
occurring value in a set. The median is the middle value of the set
when they are ordered by rank.

Evidently many in this thread don't understand that there are different
kinds of averages and that the use of the term "average", by itself,
usually refers to the arithmetic mean.

> > What I should have said was that the mean
> > number of trails needed to achieve to independent events is the sum of
> > the inverse of the probabilities of each.
>
> How the hell do you get from there to "The probabilities of two
> independent events occurring is the sum of the probabilities of each
> one occurring - not the product"?

I was thinking about the inverse ratio when I wrote this, but,
obviously, this isn't what came out. It was a long day and I wasn't
thinking straight. However, the main point remains the same. The
average time it takes to produce a system of function where the
individual parts are not required to be specifically oriented with each
other is dramatically reduced relative to a system that requires the
same minimum size as well as specific orientation of all of its parts.

> > This little mistake makes
> > little difference to the main point at hand in any case - which you
> > evidently have yet to appreciate.
>
> The main "point at hand" here is that you are proposing a statistical
> argument against evolution, yet you are so evidently incompetent in
> using statistics that you have had to be corrected with each statement
> you make.
>
> I don't understand why you don't go away and educate yourself on the
> subject.

Please, do tell me how I'm so far off base when it comes to the main
point of this thread? With your expertise in statistics, this should be
easy to do.

> > > So I suggest that you go away and learn something about the subject to
> > > avoid making a fool of yourself again.
> >
> > And I suggest you pay attention to the main point of the discussion
> > instead of getting hung up on minor tangents.
>
> The "main point of discussion" is that you are arguing in the basis of
> a flawed statistical analysis of a model which which bears little
> resemblance to that which any competent biologist would use that
> evolution has limits.

Please list this your statistic counter argument then.

> The evidence from the real world shows no such limits.

Yes, it does. Where is your example of a novel function evolving that
requires more than a few thousand fairly specified residues of genetic
real estate?

> If you want to form an argument, you need to demonstrate that the
> interpretation of the evidence from the real world is flawed, and
> provide a model which explains that evidence more robustly than the
> models used by evolutionary scientists.

I have. The evidence from the real world really does demonstrate an
exponential stalling out effect of evolutionary progress on the lowest
rungs of the ladder of functional complexity. You don't seem to
understand how the fitness landscape is affected by the exponential
decline in the ratio of potentially beneficial vs. potentially
nonbeneficial.

> If you can't, all you are demonstrating is that your model is flawed -
> which given your incompetence in basic statistics is hardly surprising.

You have yet to show that my model is fundamentally flawed at all. You
just nit pick and hang on to any little slip up I make and then think
this is enough to discount the main points without further
consideration. The situation is quite clear, however. The ratio of
potentially beneficial vs. potentially nonbeneficial does indeed
decrease exponentially with each additional minimum size and
specificity requirement. This produces a linear increase between what
is and the next closest potentially beneficial sequence in sequence
space. A linear increase in this average distance results in an
exponential decrease in the average time involved to find novel
beneficial functions at higher and higher levels of functional
complexity. This is exactly what we see happening in "real life".

Seanpit

unread,

Jul 29, 2006, 6:14:38 AM7/29/06

to

Because, when I used the term "average" by itself, several, including
you, didn't seem to understand that this term is the synonym for the
arithmetic mean - which is the value obtained by dividing the sum of a
set of quantities by the number of quantities in the set.

The mean is one kind of average. Two other common forms of averages are
the mode and the median. The mode is the frequently occurring value in

a set. The median is the middle value of the set when they are ordered
by rank.

> >The mode, in this case, is 1, the median just

> >shy of 4, and the mean is 6. Look it up. And, if you don't believe
> >me, do the experiment yourself or, better yet, get snex to explain it
> >to you.
> >
> >> >> >> >Now, how many
> >> >> >> >times would you have to roll the dice, on average, to get "6" twice?
> >> >> >> >You'd have to roll the dice 12 times. That's right, only 12 times.
> >> >> >>
> >> >> >> No, that's not true either.
> >> >> >
> >> >> >What's the mean average then?
> >> >>
> >> >> I suggest you actually do it, and find out.
> >
> >The answer is 12. That is the mean average number of dice throws it
> >would take to get six twice.
>
> Run Data Count
> --------------------------------------------------------

> 1 - 6,1,6 3 [1]
> 2 - 1,3,4,4,5,5,6,2,5,4,5,1,2,5,3,6 16
> 3 - 6,1,3,1,3,6 6 [1]

> 4 - 3,3,5,3,6,4,2,5,6 9
> 5 - 4,1,6,6 4
> 6 - 3,6,6 3
> 7 - 4,5,3,1,1,5,5,1,3,6,1,5,5,4,1,2,5,4,6 19
> 8 - 3,2,3,3,3,1,2,4,3,6,2,4,5,3,4,2,4,6 18
> 9 - 4,4,6,4,5,1,2,6 8
> 10 - 3,2,6,3,2,2,1,2,2,6 10
> 11 - 4,4,1,4,3,1,4,4,4,2,3,3,3,2,1,2,5,5,5,6,1,2,6 23
> 12 - 1,5,2,6,3,3,5,1,5,3,3,3,2,5,1,5,6 17

> 13 - 6,1,5,6 4 [1]
> 14 - 2,6,2,2,6 5
> 15 - 6,3,6 3 [1]
> 16 - 6,2,4,3,4,1,1,1,4,2,6 11 [1]

> 17 - 5,2,4,2,1,4,5,6,2,1,6 11
> 18 - 2,3,6,3,2,5,6 7
> 19 - 2,3,5,6,5,4,5,4,5,3,3,4,5,2,1,3,6 17
> 20 - 1,6,4,1,6 5
> 21 - 1,5,4,2,5,2,2,3,3,2,6,1,5,3,6 15
> 22 - 1,6,2,6 4
> 23 - 2,5,4,6,3,6 6
> 24 - 2,1,6,2,3,1,4,5,4,6 10

> 25 - 6,2,1,6 4 [1]

> ========================================================
> Total Throws = 238

[1, 1, 1, 1, 1, 1, 3, 4, 4, 5, 5, 6, 7, 8, 9, 10, 10, 11, 15, 16, 17,
17, 18, 19, 23]

[ sum = 213 ]

> Mean = 9.25
> Median = 8
> Mode = 4

Two problems, you didn't run the experiment nearly long enough, and you
didn't stop a set when "6" was thrown first - which was recorded 6
times in your data set out of 25 sets. This results in the following
changes to your "averages".

Mean = 8.52
Median = 7
Mode = 1

Now, if you keep going with more trials, you will find that the numbers
get closer and closer to:

Mean = 6
Median = slightly less than 4
Mode = 1

Zachriel

unread,

Jul 29, 2006, 8:22:51 AM7/29/06

to

"Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message

news:1154095686.4...@75g2000cwc.googlegroups.com...

All genetic codons are being mutated all the time (over sufficiently long
time scales). We are talking about neutral changes in two (or more) codons.
So your marble analogy is as if there are two bins with sixty-four marbles,
one black to represent the target, the others white, and we constantly pull
a marble from each bin then return it. (Not simultaneously, but at a similar
rate from each, "in effect simultaneous".)

<snip>

>> and then provide empirical evidence
>> and an explanatory model of still other natural mechanisms such "that
>> organization into higher-order fundamental units such as nucleic acids,
>> the
>> genetic code, secondary and tertiary structure, cellular
>> compartmentalization, cell types, and germ layers allows systems to
>> escape
>> complexity barriers and potentiates explosions in diversity".
>> http://www.pnas.org/cgi/content/full/96/6/2591
>
> None of these stories about how evolution is supposed to create
> high-level functions has ever been demonstrated in real life or even on
> paper in a remotely tenable way. These notions are just as mixed up as
> yours are when you promote your word evolution programs as actually
> modeling random mutations

Word Mutagenation didn't model biology, but your own flawed word-game
analogy.

> (the rate of which is much different for
> different kinds of mutations) and selection based on a gained
> beneficial meaning or function.

The rate and types of mutations (point or recombination) are adjustable in
the Word Mutagenation software. Turns out it's not particularly critical as
long as there is some of each.

> With these limitations in place, the
> ratio of beneficial vs. non-beneficial does indeed dramatically
> decrease with each step up the ladder of minimum size and specificity
> requirements.

Your predictions in this regard were off by many orders of magnitude.
Millions to zillions.

> This dramatic decrease in ratio creates linearly
> increasing gaps sizes between what is and the next closest potentially
> beneficial sequence at the same level or greater. And, a linear
> increase in the gap size translates into an exponential increase in the
> mean time before such a sequence will be found via any sort of random
> search.
>
>> Also, keep in mind that ignorance is not evidence, and we can't make a
>> scientific assertion of an Intelligent Designer every time there is a Gap
>> in
>> human knowledge.
>
> Scientific method is fundamentally based on ignorance as evidence.

No it's not.

> The
> lack of evidence that counters a theory, a theory, which is actually
> open to falsification, is part of the value of a useful theory. In
> other words, the lack of negative evidence that goes counter to a
> theory helps to increases the predictive value of the theory.

That's a very poor statement of how science works. Valid scientific theories
make positive predictions, as well as falsifiable ones.

Seanpit

unread,

Jul 29, 2006, 12:06:47 PM7/29/06

to

Zachriel wrote:

> >> And then argue that evolution requires two
> >> directly consecutive (in effect simultaneous)
> >> successful draws to cross the
> >> neutral gap ('product').
> >
> > That is not my argument at all. Evolution does not require consecutive
> > events to achieve success. Not at all. However, evolution does
> > require that the "correct" parts that were there before are still there
> > by the time the last needed part is realized. This becomes
> > exponentially more difficult to do if every part in a system must be
> > specifically arranged relative to every other part. For a functional
> > system of a given size, reducing this specificity requirement, even a
> > little bit, dramatically reduces the mean time needed to find all the
> > needed parts of that system in sequence space.
>
> All genetic codons are being mutated all the time (over sufficiently long
> time scales). We are talking about neutral changes in two (or more) codons.
> So your marble analogy is as if there are two bins with sixty-four marbles,
> one black to represent the target, the others white, and we constantly pull
> a marble from each bin then return it. (Not simultaneously, but at a similar
> rate from each, "in effect simultaneous".)

The problem is that for higher-level functions, more bins have to be
black at the same time (i.e., simultaneously "correct") than for
lower-level functions.

> <snip>
> >> and then provide empirical evidence
> >> and an explanatory model of still other natural mechanisms such "that
> >> organization into higher-order fundamental units such as nucleic acids,
> >> the
> >> genetic code, secondary and tertiary structure, cellular
> >> compartmentalization, cell types, and germ layers allows systems to
> >> escape
> >> complexity barriers and potentiates explosions in diversity".
> >> http://www.pnas.org/cgi/content/full/96/6/2591
> >
> > None of these stories about how evolution is supposed to create
> > high-level functions has ever been demonstrated in real life or even on
> > paper in a remotely tenable way. These notions are just as mixed up as
> > yours are when you promote your word evolution programs as actually
> > modeling random mutations
>
> Word Mutagenation didn't model biology, but your own flawed word-game
> analogy.

Your computer programs didn't even model my word-game analogy. Your
parameters were even close and they didn't select for meaning or
beneficial function - only for identity to a pre-established template.

In any case, your programs do give a pretty good idea of one thing.
Lower level functions are more interconnected than are higher-level
functions. Like pieces of sticky bubble gum, as you move up the ladder
of functional complexity, the clustered islands of beneficial sequences
will start to tear apart. As this happens, the average distance
between what is and the next closest beneficial island will start to
increase in a non-linear fashion - slowly at first, but then more and
more rapidly. Eventually this increase in linear distance will go back
to increasing in a linear fashion as the islands become more and more
isolated from each other.

> > (the rate of which is much different for
> > different kinds of mutations) and selection based on a gained
> > beneficial meaning or function.
>
> The rate and types of mutations (point or recombination) are adjustable in
> the Word Mutagenation software. Turns out it's not particularly critical as
> long as there is some of each.
>
>
> > With these limitations in place, the
> > ratio of beneficial vs. non-beneficial does indeed dramatically
> > decrease with each step up the ladder of minimum size and specificity
> > requirements.
>
> Your predictions in this regard were off by many orders of magnitude.
> Millions to zillions.

This is seemingly true at very low levels. However, as you move up the
ladder, especially if you are selecting for beneficial sequences (not
just any sequence in the dictionary) the ratio drops off even more
dramatically and the increase in time increases at a greater and
greater exponential rate - not the seemingly linear rate as might be
supposed by only looking at differences between rates at very low
levels.

> > This dramatic decrease in ratio creates linearly
> > increasing gaps sizes between what is and the next closest potentially
> > beneficial sequence at the same level or greater. And, a linear
> > increase in the gap size translates into an exponential increase in the
> > mean time before such a sequence will be found via any sort of random
> > search.
> >
> >> Also, keep in mind that ignorance is not evidence, and we can't make a
> >> scientific assertion of an Intelligent Designer every time there is a Gap
> >> in
> >> human knowledge.
> >
> > Scientific method is fundamentally based on ignorance as evidence.
>
> No it's not.

Yes, it is. If not for ignorance, the scientific method would not be
needed.

> > The
> > lack of evidence that counters a theory, a theory, which is actually
> > open to falsification, is part of the value of a useful theory. In
> > other words, the lack of negative evidence that goes counter to a
> > theory helps to increases the predictive value of the theory.
>
> That's a very poor statement of how science works. Valid scientific theories
> make positive predictions, as well as falsifiable ones.

That's true. But, if a theory does not make falsifiable predictions, it
is not a valid scientific theory - at least not according to those like
Popper and many others. That part of the theory must be in place. The
potential for falsification is what makes science so useful. Ignorance
of information that might actually falsify a theory is a very important
and useful type of ignorance.

Seanpit

unread,

Jul 29, 2006, 12:22:06 PM7/29/06

to

Gerry Murphy wrote:
> "Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message

> news:1153882557.5...@i42g2000cwa.googlegroups.com...

> >
> > Gerry Murphy wrote:
> > > "Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message

> > > news:1153788716.3...@p79g2000cwp.googlegroups.com...
> > > >
> > > <snip>
> > >
> > > > What is the mean average number of draws I'd have to make?
> > >
> > > Where the hell do you get this nomenclature, "mean average"?
> > > Your using the same textbook as nando, aren't you?
> >
> > There are different types of averages - right? The mean, median, and
> > mode?
> >
>
> Yes, and that's how we describe them. Tacking on 'average' adds nothing.

I suggested that the average number of rolls of a dice needed to get
"6" would be six rolls. When one uses the word "average" it is
generally understood that one is talking about the arithmetic mean,
which is the "standard" average. And, the mean number of rolls needed
to get a "six" when throwing a single dice is indeed six throws.

Yet, you responded by saying, "You really should stop embarrassing

yourself with this display of breathtaking ignorance."

Why accuse me of being not only ignorant here, but breathtakingly
ignorant? ; ) Your calculations were of the median, not the mean - the
standard arithmetic mean - which is indeed six rolls of the dice in
this case.

Sean Pitman
www.DetectingDesign.com

Zachriel

unread,

Jul 29, 2006, 12:25:13 PM7/29/06

to

"Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message

news:1154189207.7...@i3g2000cwc.googlegroups.com...

Then my original restatement of your position was correct.

>
>> <snip>
>> >> and then provide empirical evidence
>> >> and an explanatory model of still other natural mechanisms such "that
>> >> organization into higher-order fundamental units such as nucleic
>> >> acids,
>> >> the
>> >> genetic code, secondary and tertiary structure, cellular
>> >> compartmentalization, cell types, and germ layers allows systems to
>> >> escape
>> >> complexity barriers and potentiates explosions in diversity".
>> >> http://www.pnas.org/cgi/content/full/96/6/2591
>> >
>> > None of these stories about how evolution is supposed to create
>> > high-level functions has ever been demonstrated in real life or even on
>> > paper in a remotely tenable way. These notions are just as mixed up as
>> > yours are when you promote your word evolution programs as actually
>> > modeling random mutations
>>
>> Word Mutagenation didn't model biology, but your own flawed word-game
>> analogy.
>
> Your computer programs didn't even model my word-game analogy. Your
> parameters were even close and they didn't select for meaning or
> beneficial function - only for identity to a pre-established template.

Um, your game selected for words in the dictionary. That was your selection
criteria - not mine.

>
>
> In any case, your programs do give a pretty good idea of one thing.
> Lower level functions are more interconnected than are higher-level
> functions. Like pieces of sticky bubble gum, as you move up the ladder
> of functional complexity, the clustered islands of beneficial sequences
> will start to tear apart. As this happens, the average distance
> between what is and the next closest beneficial island will start to
> increase in a non-linear fashion - slowly at first, but then more and
> more rapidly. Eventually this increase in linear distance will go back
> to increasing in a linear fashion as the islands become more and more
> isolated from each other.

They don't seem to be isolated at all. Words seem to be highly organized
into interconnected families. And many long words are made of components
found in shorter words allowing the rapdi accretion of compound words.

>
>> > (the rate of which is much different for
>> > different kinds of mutations) and selection based on a gained
>> > beneficial meaning or function.
>>
>> The rate and types of mutations (point or recombination) are adjustable
>> in
>> the Word Mutagenation software. Turns out it's not particularly critical
>> as
>> long as there is some of each.
>>
>>
>> > With these limitations in place, the
>> > ratio of beneficial vs. non-beneficial does indeed dramatically
>> > decrease with each step up the ladder of minimum size and specificity
>> > requirements.
>>
>> Your predictions in this regard were off by many orders of magnitude.
>> Millions to zillions.
>
> This is seemingly true at very low levels. However, as you move up the
> ladder, especially if you are selecting for beneficial sequences (not
> just any sequence in the dictionary) the ratio drops off even more
> dramatically and the increase in time increases at a greater and
> greater exponential rate - not the seemingly linear rate as might be
> supposed by only looking at differences between rates at very low
> levels.

It's not linear by any means. Just not zillions. In any case, as words are
of limited length, your word-game analogy is fatally flawed and should be
abandoned.

>
>> > This dramatic decrease in ratio creates linearly
>> > increasing gaps sizes between what is and the next closest potentially
>> > beneficial sequence at the same level or greater. And, a linear
>> > increase in the gap size translates into an exponential increase in the
>> > mean time before such a sequence will be found via any sort of random
>> > search.
>> >
>> >> Also, keep in mind that ignorance is not evidence, and we can't make a
>> >> scientific assertion of an Intelligent Designer every time there is a
>> >> Gap
>> >> in
>> >> human knowledge.
>> >
>> > Scientific method is fundamentally based on ignorance as evidence.
>>
>> No it's not.
>
> Yes, it is. If not for ignorance, the scientific method would not be
> needed.

That is not what you previously stated. You said it was "based on ignorance
as evidence", nor does the latter statement support your previous statement.

>
>> > The
>> > lack of evidence that counters a theory, a theory, which is actually
>> > open to falsification, is part of the value of a useful theory. In
>> > other words, the lack of negative evidence that goes counter to a
>> > theory helps to increases the predictive value of the theory.
>>
>> That's a very poor statement of how science works. Valid scientific
>> theories
>> make positive predictions, as well as falsifiable ones.
>
> That's true. But, if a theory does not make falsifiable predictions, it
> is not a valid scientific theory - at least not according to those like
> Popper and many others. That part of the theory must be in place. The
> potential for falsification is what makes science so useful. Ignorance
> of information that might actually falsify a theory is a very important
> and useful type of ignorance.

Falsification is a specific test with an actual result. It is not a gap in
knowledge.

Richard Forrest

unread,

Jul 29, 2006, 2:05:08 PM7/29/06

to

Seanpit wrote:
> > It was a slip which revealed your fundamental lack of education in
> > statistics.
> > Nobody with any familiarity with statistical methods would have made
> > such a fundamental error.
> >
> > This has been confirmed by your use of terms such as "mean average",
> > which are meaningless.
>
> The "mean" is one kind of average. It is computed by summing the values
> and dividing by the number of values. Two other common forms of
> averages are the mode and the median. The mode is the most frequently
> occurring value in a set. The median is the middle value of the set
> when they are ordered by rank.
>
> Evidently many in this thread don't understand that there are different
> kinds of averages and that the use of the term "average", by itself,
> usually refers to the arithmetic mean.

What is evident is that many posters on this thread know far more about
statistics than you do, and that almost all your posts on the subject
contain rather fundamental errors.

That you snip most of the responses pointing out those errors speaks
for itself.

>
> > > What I should have said was that the mean
> > > number of trails needed to achieve to independent events is the sum of
> > > the inverse of the probabilities of each.
> >
> > How the hell do you get from there to "The probabilities of two
> > independent events occurring is the sum of the probabilities of each
> > one occurring - not the product"?
>
> I was thinking about the inverse ratio when I wrote this, but,
> obviously, this isn't what came out. It was a long day and I wasn't
> thinking straight. However, the main point remains the same. The
> average time it takes to produce a system of function where the
> individual parts are not required to be specifically oriented with each
> other is dramatically reduced relative to a system that requires the
> same minimum size as well as specific orientation of all of its parts.
>

Which is an unfounded assertion with no support in the way of evidence
from the real world.

What is clear is that your statistical model is flawed. There are no
discontinuities in the nested hierarchy of living organisms as your
model suggests, the model itself bears little resemblance to that which
any competent biologist would produce, and your evident incompetence in
basic statistics hardly adds to the strength of your argument.

> > > This little mistake makes
> > > little difference to the main point at hand in any case - which you
> > > evidently have yet to appreciate.
> >
> > The main "point at hand" here is that you are proposing a statistical
> > argument against evolution, yet you are so evidently incompetent in
> > using statistics that you have had to be corrected with each statement
> > you make.
> >
> > I don't understand why you don't go away and educate yourself on the
> > subject.
>
> Please, do tell me how I'm so far off base when it comes to the main
> point of this thread? With your expertise in statistics, this should be
> easy to do.

Where you are so far off base is that your model bears little
resemblance to any model of evolutionary theory, and that it makes the
prediction that there will be discontinuities in the nested hierarchy
of living organisms.

There are no such discontinuities, so your model fails.

This is the point at which you need to start to rethink your model, and
figure out why it fails to match what we find in the real world rather
than insisting that the real world is wrong and your model is correct.

>
> > > > So I suggest that you go away and learn something about the subject to
> > > > avoid making a fool of yourself again.
> > >
> > > And I suggest you pay attention to the main point of the discussion
> > > instead of getting hung up on minor tangents.
> >
> > The "main point of discussion" is that you are arguing in the basis of
> > a flawed statistical analysis of a model which which bears little
> > resemblance to that which any competent biologist would use that
> > evolution has limits.
>
> Please list this your statistic counter argument then.

Why? The model fails to make any predictions which can be matched by
observation of the real world. As for the statistical competence,
numerous posters have pointed out flaws in your argument which you
simply ignore.

>
> > The evidence from the real world shows no such limits.
>
> Yes, it does. Where is your example of a novel function evolving that
> requires more than a few thousand fairly specified residues of genetic
> real estate?

Where is the evidence for discontinities in the nested hierarchy of
living organism?

There are numerous examples in which the evolution of a particular
system - the mammalian jaw, for example - is strongly supported by the
fossil evidence. If you have a better explanation for the series of
fossils which demonstrate the sequence, and which can be tested against
the evidence, produce it. Furthermore, we know from experiments with
fruit flies that changes in single genes can cause halteres to develope
as wings. This shows that at some time in the past, those halteres were
wings, and that through evolutionary processes they were changed into
halteres, a modification which could not have taken place without
changing more than "a few thousand specified residues of genetic real
estate" - an invention of yours which you have failed to define in any
meaningful way, incidentally.

If you have a better, testable explanation for this and numerous other
evidences of major evolutionary change, feel free to produce it.
Meaningless assertions about "common design" don't count. It is a
concept you have invented, which you have not defined, is not baed on
the characteristics of any collection of work from a known designer,
and which makes no predictions whatsoever.

>
> > If you want to form an argument, you need to demonstrate that the
> > interpretation of the evidence from the real world is flawed, and
> > provide a model which explains that evidence more robustly than the
> > models used by evolutionary scientists.
>
> I have. The evidence from the real world really does demonstrate an
> exponential stalling out effect of evolutionary progress on the lowest
> rungs of the ladder of functional complexity.

No it doesn't.

It shows a more or less unbroken sequence of morphologies which match
the stages which evolutionary theory would predict. We find them for
every organ in the body, from the eye to the heart.

> You don't seem to
> understand how the fitness landscape is affected by the exponential
> decline in the ratio of potentially beneficial vs. potentially
> nonbeneficial.

You don't seem to understand that you are making predictions about the
shape of that fitness landscape which are not supported by the
evidence.

When your model doesn't fit you need to abandon your model, not insist
that your model is right and the real world is wrong.

>
> > If you can't, all you are demonstrating is that your model is flawed -
> > which given your incompetence in basic statistics is hardly surprising.
>
> You have yet to show that my model is fundamentally flawed at all.

The principal prediction your model makes - i.e. discontinuities in the
nested heirarchy of living organisms - is not matched by the evidence.
If the evidence from the real world does not match the predictions of
your theory, your theory is flawed.

> You
> just nit pick and hang on to any little slip up I make and then think
> this is enough to discount the main points without further
> consideration. The situation is quite clear, however. The ratio of
> potentially beneficial vs. potentially nonbeneficial does indeed
> decrease exponentially with each additional minimum size and
> specificity requirement. This produces a linear increase between what
> is and the next closest potentially beneficial sequence in sequence
> space. A linear increase in this average distance results in an
> exponential decrease in the average time involved to find novel
> beneficial functions at higher and higher levels of functional
> complexity. This is exactly what we see happening in "real life".

We do?

Where are the discontiunities your model predicts in the nested
hierarchy of living organisms?

RF

Zachriel

unread,

Jul 29, 2006, 2:21:07 PM7/29/06

to

"Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message

news:1154190126....@m73g2000cwd.googlegroups.com...

>
> Gerry Murphy wrote:
>> "Seanpit" <seanpi...@naturalselection.0catch.com> wrote in message
>> news:1153882557.5...@i42g2000cwa.googlegroups.com...
>> >
>> > Gerry Murphy wrote:
>> > > "Seanpit" <seanpi...@naturalselection.0catch.com> wrote in
>> > > message
>> > > news:1153788716.3...@p79g2000cwp.googlegroups.com...
>> > > >
>> > > <snip>
>> > >
>> > > > What is the mean average number of draws I'd have to make?
>> > >
>> > > Where the hell do you get this nomenclature, "mean average"?
>> > > Your using the same textbook as nando, aren't you?
>> >
>> > There are different types of averages - right? The mean, median, and
>> > mode?
>> >
>>
>> Yes, and that's how we describe them. Tacking on 'average' adds nothing.
>
> I suggested that the average number of rolls of a dice needed to get
> "6" would be six rolls. When one uses the word "average" it is
> generally understood that one is talking about the arithmetic mean,
> which is the "standard" average.

Actually, in this case, "average" would probably not be interpreted as the
mean. In other words, if you were to lay even odds on whether it would be at
least six rolls before the first six appeared, you would (probably) lose.
And this is exactly how most people would interpret the bet. There is a
slightly better than even chance of rolling the first six in the first four
rolls, the median.

1, 16.67%
2, 30.56%
3, 42.13%
4, 51.77%
5, 59.81%
6, 66.51%
7, 72.09%
8, 76.74%
9, 80.62%
10, 83.85%
11, 86.54%
12, 88.78%
13, 90.65%
14, 92.21%
15, 93.51%
16, 94.59%
17, 95.49%
18, 96.24%
19, 96.87%
20, 97.39%

Interestingly, one is the mode!

> And, the mean number of rolls needed
> to get a "six" when throwing a single dice is indeed six throws.
>
> Yet, you responded by saying, "You really should stop embarrassing
> yourself with this display of breathtaking ignorance."

Certainly imprecise. But I didn't miss any inhalations.

--
Zachriel, angel that rules over memory, presides over the planet Jupiter.
http://zachriel.blogspot.com/

Don Cates

unread,

Jul 29, 2006, 3:21:49 PM7/29/06

to

On Mon, 24 Jul 2006 22:16:44 -0400, Augray <aug...@sympatico.ca>
posted:

>On 24 Jul 2006 17:21:32 -0700, "Seanpit"
><seanpi...@naturalselection.0catch.com> wrote in
><1153786892....@75g2000cwc.googlegroups.com> :
>
>>Augray wrote:
>>> On 24 Jul 2006 08:26:18 -0700, "Seanpit"
>>> <seanpi...@naturalselection.0catch.com> wrote in
>>> <1153754778.5...@i42g2000cwa.googlegroups.com> :
>>>
>>> >
>>> >Augray wrote:
>>> >> On 24 Jul 2006 00:33:44 -0700, "Richard Forrest"
>>> >> <ric...@plesiosaur.com> wrote in
>>> >> <1153726424....@m79g2000cwm.googlegroups.com> :
>>> >>
>>> >> >
>>> >> >Seanpit wrote:
>>> >> ><snipped>

>>> >> >>
>>> >> >> The probabilities of two independent events occurring is the sum of the

>>> >> >> probabilities of each one occurring - not the product.
>>> >> >>
This is, of course, completely wrong. But it is not what Sean has been
talking about in most of the thread. He is not talking probabilities
but number of trials. He gets his terminology hopelessly confused at
times and makes, as above, some real howlers, but the idea (IIUC) that
he is trying to get across is correct. (more below)

>>> >> >
>>> >> >For crying out loud, Sean, learn some basic mathematics.
>>> >> >
>>> >> >You are making an argument based on probabilities, yet are so freaking
>>> >> >ignorant about the mathematics of probability that you make an
>>> >> >assertion as stupid as this?
>>> >> >
>>> >> >Get real.
>>> >>

>>> >> Yes, this is jaw-dropping stuff. What does Sean believe the
>>> >> probability is of two independent, yet certain, events occurring to
>>> >> be? 2 in 1?

>>> >
>>> >The odds that they will occur in a given set of a sum of their
>>> >independent odds is 50:50.
>>>

>>> Please note that I used the term "certain" to describe these events.
>>> Since the odds of such an event happening is 1 chance in 1, can I
>>> therefore conclude that the chance of two certain events happening is
>>> 2 in 1?
>>
>>That's not the question I'm asking here.
>
But it is, unfortunately, what you actually stated in the fragment I
replied to above.

>Nevertheless, it's a very illustrative example. Another example is
>rolling two dice at a time. The chance of getting two 6s is 1 in 36,
>but according to you it wou1d be 2 in 6.
>
Which refers to what he stated, but not what he meant. (which you
could have gotten from a more complete reading of posts)

>
>>> >For example, how many times would you have
>>> >to role a dice to get the side with six dots on it? You'd have to roll
>>> >the dice 6 times, on average.
>>>
>>> No, that's just wrong. Have you never rolled a six on the first try?
>>

He is correct here. On average, six rolls (not role) of a die (not
dice) to get a six. The *most likely* number of rolls is less.

>>Sure, but it is also possible to fail to roll a six even after a
>>million tries. This puts the overall odds, the mean average, at 6
>>rolls to get a 6.
>

There are no "odds" here, just average (not "mean average") number of
rolls. No "odds", no 'probabilities'; a 'count' of events only.

>Please show your math.
>
1/6 of the time - 1st roll 95/6 of the times remaining)
5/36 of the time - 2nd roll (1/6*5/6) (36/36-6/36-5/36=25/36 of the
times remaining)
25/216 of the time - 3rd roll (1/6*25/36)
etc
average # of rolls = 6

>
>>> >Are you with me so far?
>>>
>>> Nope.
>>

>>What is the mean average number of rolls of a dice needed to get a six?
>
I do not know why he is using "mean average". In this case it makes no
sense. He should use one or the other, not both. (and 'die' is the
singular of 'dice'; personal nit-pick)

>Between 3 and 4.
>
This is the most likely number of rolls not the average number.

>
>>> >Now, how many
>>> >times would you have to roll the dice, on average, to get "6" twice?
>>> >You'd have to roll the dice 12 times. That's right, only 12 times.
>>>
>>> No, that's not true either.
>>

Yes it is true.

>>What's the mean average then?
>

Arrrgh, not "mean average".

>I suggest you actually do it, and find out.
>
>

>>> >Do
>>> >the experiment yourself and you will see that this is true. If you
>>> >roll the dice 12 times, on average, you will roll the number six twice
>>> >per set of 12 rolls.
>>>
>>> That's a different experiment.
>>

But it is the experiment he was originally referring to, he just got
confused in his terminology along the way.

>>This is what this entire discussion is about - -
>
>Then you should make up your mind as to what it's about. First you ask
>for the average number of rolls to get 6 twice, and then you ask for
>the number of times 6 will appear in runs of 12 rolls each. These are
>two different questions.
>

They might be, but in the context of his (correct) claim that the
answer to the first question is 12, the second question is the same
claim.
>
While he is correct on this bit of math, I am not sure of its
relevance to the overall discussion.
--
Don Cates ("he's a cunning rascal" - PN)

Don Cates

unread,

Jul 29, 2006, 3:21:53 PM7/29/06

to

On 24 Jul 2006 20:49:29 -0700, "Seanpit"

<seanpi...@naturalselection.0catch.com> posted:

>
>Augray wrote:
>
>< snip >

>
>> >> >For example, how many times would you have
>> >> >to role a dice to get the side with six dots on it? You'd have to roll
>> >> >the dice 6 times, on average.
>> >>
>> >> No, that's just wrong. Have you never rolled a six on the first try?
>> >

>> >Sure, but it is also possible to fail to roll a six even after a
>> >million tries. This puts the overall odds, the mean average, at 6
>> >rolls to get a 6.
>>

>> Please show your math.
>
>The median number or rolls needed to get a six is just under 4 rolls.
>In other words, 50% of rolls will be less and 50% more before six is
>realized. However, the 50% of rolls that are more may be a whole lot

>more - up to infinity. This results in skewed graph where the mean,

>median, and mode are not the same. The mean will have a greater value
>than the median - or so it seems to me. In fact, it seems to me that
>the mean will be 6 in this case while the mode will be just under 4.
>
>The theoretical distribution for large numbers of throws of unbiased

>dice has:mode = 1 median = 4 (The mean, not calculated here, is 6.)
>
>http://www.rsscse.org.uk/pose/level1/book5/notes.htm

>
>> >> >Are you with me so far?
>> >>
>> >> Nope.
>> >
>> >What is the mean average number of rolls of a dice needed to get a six?
>>

>> Between 3 and 4.
>
>Are both the mean and median between 3 and 4?
>

>> >> >Now, how many
>> >> >times would you have to roll the dice, on average, to get "6" twice?
>> >> >You'd have to roll the dice 12 times. That's right, only 12 times.
>> >>
>> >> No, that's not true either.
>> >

>> >What's the mean average then?
>>

>> I suggest you actually do it, and find out.
>
>> >> >Do
>> >> >the experiment yourself and you will see that this is true. If you
>> >> >roll the dice 12 times, on average, you will roll the number six twice
>> >> >per set of 12 rolls.
>> >>
>> >> That's a different experiment.
>> >

>> >This is what this entire discussion is about - -
>>
>> Then you should make up your mind as to what it's about. First you ask
>> for the average number of rolls to get 6 twice, and then you ask for
>> the number of times 6 will appear in runs of 12 rolls each. These are
>> two different questions.
>

>They are very much related to the notion that two short sequences can
>be found in sequence space just as easily as one long sequence of a
>size equivalent to the sum of the two short sequences.
>

It depends. Are we looking in the full sequence space or just the bit
you can actually reasonably reach from an existing sequence space? Are
the two short sequences completely independent of each other? Is the
long sequence anything like a concatenation of shorters sequences?
I am sure there are many more relevant questions.

Determining the Functional Specificity Requirement - addition vs. multiplication

Seanpit

Von R. Smith

Seanpit

Richard Forrest

Augray

Von R. Smith

Seanpit

Seanpit

Seanpit

Von R. Smith

Ken Denny

B Richardson

Richard Forrest

Dave...@aol.com

Dave...@aol.com

Desertphile

Desertphile

Augray

Von R. Smith

Von R. Smith

Greg Guarino

Gerry Murphy

Gerry Murphy

Seanpit

Seanpit

bullpup

Seanpit

Seanpit

Seanpit

Seanpit

Seanpit

snex

snex

Seanpit

Seanpit

Seanpit

Seanpit

Von R. Smith

Seanpit

snex

Von R. Smith

Seanpit

Von R. Smith

Augray

Seanpit

snex

Seanpit

Seanpit

Seanpit

Von R. Smith

Earle Jones

Marc

Gerry Murphy

David Wilson

Seanpit

Augray

Von R. Smith

Seanpit

Seanpit

Seanpit

Seanpit

Seanpit

Seanpit

Zachriel

Gerry Murphy

Von R. Smith

Von R. Smith

Richard Forrest

Josh Hayes

Richard Forrest

Von R. Smith

John Wilkins

Richard Forrest

Seanpit

Seanpit

Seanpit

Seanpit

Seanpit

Augray