comparison of bwpd between different reference gene trees

13 views
Skip to first unread message

andy...@rothamsted.ac.uk

unread,
Mar 27, 2018, 12:50:37 PM3/27/18
to pplacer users
Hello everybody

I've been using pplacer successfully for a while now, but this is my first time on the user group.  

I have a question regarding the comparison of bwpd of multiple genes across different environments.  I am satisfied that within the limits of what I am working with that bwpd is insensitive to differences in sequencing depth, as published.  However, when considering bwpd of different genes, to what extent does the reference gene set allow meaningful comparison?  

Consider the following examples:  

1) I have 12 environments within which I would like to compare the diversity of two genes, geneA and geneB.  The geneA reference alignment and tree is comprised of 1400 sequences while the same for geneB is comprised of only 350 sequences.  These differences reflect the number of genes identified in the non-redundant sequence collection and so probably reflect something about the relative diversity of the two genes.
  
2) In a second example, a reference set for a third gene, geneZ, is generated using something like Cd-HIT to reduced the redundancy of the reference set (this might be done in an attempt to "tune" the pHMM to improve performance) and the resulting reference set contains only 100 sequences.

If I use the jplace files generated for each gene in each environment to calculate bwpd, are the comparisons between the three genes meaningful?  That is to say, if bwpd for geneA is greater than geneB or geneZ, can it be assumed that this is real, or just an artifact of the number of sequences in the respective reference sets.

I have wracked my brain around this problem over and over and cannot convince myself that there are likely to be no artifacts.

Any insight from other users would be greatly appreciated.

cheers
Andy

Erick Matsen

unread,
Mar 27, 2018, 2:00:19 PM3/27/18
to pplace...@googlegroups.com
Hello Andy--

I can't think of a way in which a BWPD comparison between genes would make sense. What is the question you wish to answer by such a comparison? 

Erick

--
You received this message because you are subscribed to the Google Groups "pplacer users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pplacer-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Frederick "Erick" Matsen, Associate Member
Fred Hutchinson Cancer Research Center
http://matsen.fredhutch.org/

Andy Neal

unread,
Mar 27, 2018, 6:00:10 PM3/27/18
to pplace...@googlegroups.com

Hi Erick

Thank you for the quick reply.  I’ll try and explain my interest in comparing BWPD between genes.  My question stems from the common application of alpha-diversity measures to infer differences between communities, but I would like to do this at the functional rather than taxonomic level.

 

One question relates to system resilience and whether different processes, for example nitrogen fixation or phosphorus acquisition via phosphate ester hydrolysis, are likely to be more or less resilient based upon estimates of their bwpd, for example of nifH versus phoD.

 

Another example is whether genes predicted to be subject to horizontal genetic transfer (for example antibiotic resistance genes) exhibit reduced bwpd  compared to a marker gene such as 16S rRNA gene.

 

These are ecologically interesting questions, if you think that pplacer is not the correct tool, to use I would like to know now rather than once a manuscript is submitted.

 

Cheers

Andy 

--
You received this message because you are subscribed to a topic in the Google Groups "pplacer users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pplacer-users/6A5QGnRZikU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pplacer-user...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.


Rothamsted Research is a company limited by guarantee, registered in England at Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a not for profit charity number 802038.

Erick Matsen

unread,
Mar 27, 2018, 8:17:24 PM3/27/18
to pplace...@googlegroups.com
Hello Andy--


Thanks for your explanation. I'm not sure if I totally get it, but consider the following example. Let's say we have two genes that are right next to each other in the DNA of all the organisms under investigation. They are also equally expressed. In fact, the only difference between them is that one evolves twice as fast as the other. 

The BWPD of the faster-evolving gene will be larger because of the longer branch lengths. Is that an interesting observation? To me, if the two genes have the same expression patterns than they should be considered to be equivalent.


Erick

Andy Neal

unread,
Mar 28, 2018, 2:52:11 AM3/28/18
to pplace...@googlegroups.com

Hi Erick

 

I think we’re getting closer to the nub of the issue here.

 

Your description of the relative rates of evolution makes sense and is an extremely interesting question – is genetic context important here though?  You describe two genes which sit right next to each other in the genome, for the genes I am interested in this is never the case.  Does this violate any assumptions you are making.

 

I am also intrigued with your switch from discussing BWPD to rates of evolution – does this imply that BWPD is in fact a measure of rate of evolution rather than a diversity metric?  My original interest was in whether there are relationships between environment and gene diversity (BWPD) and chose to compare functional genes to the BWPD of 16S rRNA in each environment.  Here is an example of the resulting plot

In this example, phoD BWPD is reduced (less diverse) compared to 16S – but there are also differences between the environments.  Taking the tundra and Angelo datasets in the plot, is it reasonable to conclude that phoD is more diverse in Angelo than in tundra based upon the placement of reads across the phoD reference tree?  Your comment regarding rates of evolution suggests an alternative explanation – that 16S is evolving at a greater rate than phoD (an interesting observation in itself), but since the same tree is used for phoD across the sites, we cannot draw this conclusion for the differences between the sites – the explanation has to remain one around the diversity of phoD.  Am I correct (or even making myself clear)?

 

 

 

 

 

For a second gene, the relationship looks like this across the same sites

 

For coxL there appears to be increased diversity than 16S for some of the sites, and less in others.  Looking at tundra and Angelo again, Angelo exhibits a greater diversity of coxL gene than is present in tundra – consistent with what is observed for phoD above.  Your alternative explanation suggests that coxL may be evolving at a greater rate than both 16S (at least at some sites) and phoD.

 

 

 

 

 

 

My initial question related to whether it is reasonable to compare, let’s say phoD and 16S rRNA, given the fact that the trees are very different in size (and topology).  In your replies you appear not to have freaked out, and so I assume that you cannot see any fundamental impediment to taking this approach.  The issue you raise regarding rates of evolution is an additional insight that I had not considered, but is extremely valid when considering how different environments “function”.  Does my explanation of the two plots make sense?

 

Thank you for an extremely stimulating discussion

Erick Matsen

unread,
Mar 29, 2018, 12:26:56 PM3/29/18
to pplace...@googlegroups.com
Hello Andy--


Fundamentally, PD is a measure of total branch length. Branch length is an integral of rate across time.

It makes sense as a measure of diversity if we are comparing between data sets for a given gene. Once we are comparing between trees then we get conflation with tree structure and evolutionary rates.

The genes in my example don't need to be next to one another. 

I should have been more "freak out": this doesn't seem like a good idea. Whether it's evolutionary rates or tree structure, we are comparing fundamentally different things once the gene changes, so it can't be used as a measure of microbial diversity.


Erick

Andy Neal

unread,
Apr 4, 2018, 7:46:10 AM4/4/18
to pplace...@googlegroups.com

Hello again Erick

I dropped the ball over Easter but wanted to thank you for your reply. You have confirmed my suspicions that there were issues with the approach I was taking.

 

So, to summarize – comparing BWPD of a gene across different environments is reasonable but comparison of different genes across different environments brings with it many attendant issues and should not be tried!

 

Thank you

Erick Matsen

unread,
Apr 4, 2018, 12:13:34 PM4/4/18
to pplace...@googlegroups.com
Agreed!
Reply all
Reply to author
Forward
0 new messages