Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Inferring phylogenies + duplication and divergence

0 views
Skip to first unread message

Perplexed in Peoria

unread,
Apr 19, 2007, 4:49:45 PM4/19/07
to
A paper on the evolution of the bacterial flagellum (here and here)
http://www.pnas.org/cgi/content/abstract/0700266104v1
http://sciencenow.sciencemag.org/cgi/content/full/2007/417/3

has triggered some critical comments from bloggers (here and here).
http://www.pandasthumb.org/archives/2007/04/flagellum_evolu_1.html#more
http://genomicron.blogspot.com/2007/04/genome-sequences-reduce-complexity-of.html

And here is a blogger who has bad feelings about all this instant blogging.
http://scienceblogs.com/loom/2007/04/17/when_scientists_go_all_bloggy.php

Interesting stuff. I suppose discussion of the relationship of all this
to Behe/Miller/Matzke probably belongs over on talk.origins. But I am
curious about some methodological questions related to inference about
gene duplication and divergence and its relationship to phylogenetic inference.

To oversimplify a bit, the basic claim of the flagellum paper is that flagella
in some 38 bacterial species all contain "the same" 24 core proteins. And
that those 24 proteins all arose from a single ancestral protein which
duplicated and diverged in a pre-LUCA organism. And then that those 24
protein genes underwent further evolution as the LUCA branched over time
into the 38 species.

This kind of thing has been done before, of course, with tRNAs by Eigen's group
many decades ago, and many times since. But my first question is whether
there is a good review paper saying how one ought to go about it, and what
are the pitfalls?

One way of thinking about the problem is to build a matrix with (say) 38 rows
and 24 columns. E. coli, for example, gets row #3. And one of the 24 genes,
FlgA say, gets column #5. We have 24x38 gene sequences in our database.

Now, one way to proceed is to concatenate all of the sequences in each row,
and then build a tree of rows for the phylogeny. Then, concatenate all
of the sequences in each column and build a separate tree of columns for the
gene duplication/divergence hypotheses. Is this valid? I realize that
you have to somehow make sure that the alignments in the rows and columns
match up, but is there anything else that needs to be done?

Of course, the problem becomes more complex if you don't know (or don't assume)
that all of the gene duplication/divergences took place before all of the
organism lineage branches. Or if some of the diverged genes have been lost
in some lineages. Felsenstein's book touches briefly on some of the issues
here, but I am wondering whether there is a more complete treatment anywhere.

Also, I am curious whether phylogeny experts here agree with the bloggers and
with my own intuitions that this flagellum paper seriously overstates its case.

ErikW

unread,
Apr 20, 2007, 1:20:31 PM4/20/07
to
On Apr 19, 10:49 pm, "Perplexed in Peoria" <jimmene...@sbcglobal.net>
wrote:
> A paper on the evolution of the bacterial flagellum (here and here)http://www.pnas.org/cgi/content/abstract/0700266104v1http://sciencenow.sciencemag.org/cgi/content/full/2007/417/3
>
> has triggered some critical comments from bloggers (here and here).http://www.pandasthumb.org/archives/2007/04/flagellum_evolu_1.html#morehttp://genomicron.blogspot.com/2007/04/genome-sequences-reduce-comple...
>
> And here is a blogger who has bad feelings about all this instant blogging.http://scienceblogs.com/loom/2007/04/17/when_scientists_go_all_bloggy...

>
> Interesting stuff. I suppose discussion of the relationship of all this
> to Behe/Miller/Matzke probably belongs over on talk.origins. But I am
> curious about some methodological questions related to inference about
> gene duplication and divergence and its relationship to phylogenetic inference.
>
> To oversimplify a bit, the basic claim of the flagellum paper is that flagella
> in some 38 bacterial species all contain "the same" 24 core proteins. And
> that those 24 proteins all arose from a single ancestral protein which
> duplicated and diverged in a pre-LUCA organism. And then that those 24
> protein genes underwent further evolution as the LUCA branched over time
> into the 38 species.
>
> This kind of thing has been done before, of course, with tRNAs by Eigen's group
> many decades ago, and many times since. But my first question is whether
> there is a good review paper saying how one ought to go about it, and what
> are the pitfalls?
>
> One way of thinking about the problem is to build a matrix with (say) 38 rows
> and 24 columns. E. coli, for example, gets row #3. And one of the 24 genes,
> FlgA say, gets column #5. We have 24x38 gene sequences in our database.
>
> Now, one way to proceed is to concatenate all of the sequences in each row,
> and then build a tree of rows for the phylogeny. Then, concatenate all
> of the sequences in each column and build a separate tree of columns for the
> gene duplication/divergence hypotheses. Is this valid?

I'm not sure what you want to do so I'll just propose a way of
analysing it. Sstart with all the 24x38 sequences in one big tree and
see how they cluster (cause similarity, the initial homology criterion
I imagine, isn't the same as phylogenetic relationship). If you're
satisfied with that, that is, you have your 24 clusters, each of
probable homologous proteins, then compare gene-phylogenies (a
separate tree for each column) with each other and with species-
phylogenies (the concatenated paralogoue sequences and preferrably
also some other non-flagellar sequences) to see if they agree. If they
do, conclude that duplications occered in a common ancestor.

I'd be surprised if the results were clearcut in the end :)

> I realize that
> you have to somehow make sure that the alignments in the rows and columns
> match up, but is there anything else that needs to be done?
>
> Of course, the problem becomes more complex if you don't know (or don't assume)
> that all of the gene duplication/divergences took place before all of the
> organism lineage branches. Or if some of the diverged genes have been lost
> in some lineages. Felsenstein's book touches briefly on some of the issues
> here, but I am wondering whether there is a more complete treatment anywhere.
>
> Also, I am curious whether phylogeny experts here agree with the bloggers and
> with my own intuitions that this flagellum paper seriously overstates its case.

If noone answers that hope I can get around to reading the paper in a
few days and then pretend to be an expert on protein phylogenies.

ErikW


Perplexed in Peoria

unread,
Apr 23, 2007, 12:21:47 AM4/23/07
to

"ErikW" <bryo...@hotmail.com> wrote in message news:f0asov$2c8h$1...@darwin.ediacara.org...

> On Apr 19, 10:49 pm, "Perplexed in Peoria" <jimmene...@sbcglobal.net>
> wrote:
> > A paper on the evolution of the bacterial flagellum (here and here)
> > http://www.pnas.org/cgi/content/abstract/0700266104v1
> > http://sciencenow.sciencemag.org/cgi/content/full/2007/417/3
> >
> > has triggered some critical comments from bloggers (here and here).
[snip damaged urls - go to the original posting if you want them]

> >
> > Interesting stuff. I suppose discussion of the relationship of all this
> > to Behe/Miller/Matzke probably belongs over on talk.origins. But I am
> > curious about some methodological questions related to inference about
> > gene duplication and divergence and its relationship to phylogenetic inference.
> >
> > To oversimplify a bit, the basic claim of the flagellum paper is that flagella
> > in some 38 bacterial species all contain "the same" 24 core proteins. And
> > that those 24 proteins all arose from a single ancestral protein which
> > duplicated and diverged in a pre-LUCA organism. And then that those 24
> > protein genes underwent further evolution as the LUCA branched over time
> > into the 38 species.
> >
> > This kind of thing has been done before, of course, with tRNAs by Eigen's group
> > many decades ago, and many times since. But my first question is whether
> > there is a good review paper saying how one ought to go about it, and what
> > are the pitfalls?
> >
> > One way of thinking about the problem is to build a matrix with (say) 38 rows
> > and 24 columns. E. coli, for example, gets row #3. And one of the 24 genes,
> > FlgA say, gets column #5. We have 24x38 gene sequences in our database.
> >
> > Now, one way to proceed is to concatenate all of the sequences in each row,
> > and then build a tree of rows for the phylogeny. Then, concatenate all
> > of the sequences in each column and build a separate tree of columns for the
> > gene duplication/divergence hypotheses. Is this valid?
>
> I'm not sure what you want to do so I'll just propose a way of
> analysing it.

I'm not completely sure what I want either. One thing, of course, is to
have some statistical measure of how good the hypothesis of gene duplication
and divergence is (and how good is the inferred tree) - presumably with
reference to a null hypothesis that there is no shared ancestry between the
genes. Also, the authors seem to think that the hypothesis is best supported
when you look at lots of species simultaneously, rather that at each separately.
That seems reasonable, but how do you go about doing that?

> Start with all the 24x38 sequences in one big tree and


> see how they cluster (cause similarity, the initial homology criterion
> I imagine, isn't the same as phylogenetic relationship). If you're
> satisfied with that, that is, you have your 24 clusters, each of
> probable homologous proteins, then compare gene-phylogenies (a
> separate tree for each column) with each other and with species-
> phylogenies (the concatenated paralogoue sequences and preferrably
> also some other non-flagellar sequences) to see if they agree. If they
> do, conclude that duplications occered in a common ancestor.
>
> I'd be surprised if the results were clearcut in the end :)

My impression is that this is pretty much what they did. But the question
arises - what do you do if the phylogenies for each gene do not agree.
There was some discussion in the paper of a particular difference between their
flagellum protein tree and the standard tree from other universal protein
sequences. They interpret that as evidence for a horizontal transfer of the
whole flagellum complex. Seems fairly reasonable to me - IF they are right
on the other stuff.

> > I realize that
> > you have to somehow make sure that the alignments in the rows and columns
> > match up, but is there anything else that needs to be done?
> >
> > Of course, the problem becomes more complex if you don't know (or don't assume)
> > that all of the gene duplication/divergences took place before all of the
> > organism lineage branches. Or if some of the diverged genes have been lost
> > in some lineages. Felsenstein's book touches briefly on some of the issues
> > here, but I am wondering whether there is a more complete treatment anywhere.
> >
> > Also, I am curious whether phylogeny experts here agree with the bloggers and
> > with my own intuitions that this flagellum paper seriously overstates its case.
>
> If noone answers that hope I can get around to reading the paper in a
> few days and then pretend to be an expert on protein phylogenies.

Actually, you probably need to read half a dozen papers and skim Joe's book
(like I have) before you can convincingly pretend to be an expert. ;-)


0 new messages