beta-diversity: Bray-Curtis, Jaccard and UniFrac

2,484 views
Skip to first unread message

Anastasia

unread,
May 22, 2013, 3:43:35 AM5/22/13
to qiime...@googlegroups.com
Dear QIIMErs,

I have analysed 2 datasets generated with universal bacterial primers and have used abundance-based (Bray-Curtis, Jaccard and weighted UniFrac) and occurence based (unweighted UniFrac) beta-diversity measures. For the analysed datasets I got the following results:
Bray-Curtis, 80.02%
Jaccard, 76.90%
weighted UniFrac, 24.02%
unweighted UniFrac, 71.71%
As far as I understood all of them express dissimilarity of the compared samples and, in contrast to other indices, UniFrac also implements phylogenetic information. How could one explain that weighted UniFrac values is significantly lower than Bray-Curtis and Jaccard? Does it mean that these datasets contain OTUs that are phylogenetically highly similar to each other?

Thank you in advance.
Anastasia

Antonio González Peña

unread,
May 22, 2013, 8:48:57 AM5/22/13
to Qiime Forum
I'm guessing that the numbers you are showing are the percentage
explained by the first axis of each method, right? I think the reason for the
differences are: (1) because they are different algorithms, and (2) as
you say the most abundant OTUs are phylogenetically similar, note that
unweighted unifrac give a high % too. Now, the question is if you are
seeing the separation you are expecting.

Anyway, here is a paper comparing different non phylogenetic (or maybe
star phylogeny will be a better name?)
http://www.ncbi.nlm.nih.gov/pubmed/20818378
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Qiime Forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to qiime-forum...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
Antonio

Anastasia

unread,
May 23, 2013, 4:27:56 AM5/23/13
to qiime...@googlegroups.com
Dear Antonio,

I have expected a big difference between these datasets, but was disoriented by value of the weighted UniFrac.

Thanks for your explanation and for the paper link.

Best regards,
Anastasia


среда, 22 мая 2013 г., 9:43:35 UTC+2 пользователь Anastasia написал:

Andrew Krohn

unread,
Feb 22, 2014, 9:04:30 PM2/22/14
to qiime...@googlegroups.com
I would like to weigh in here, and please everyone help me to better understand these indices if necessary.  I have a suspicion about Anastasia's results given the way I tend to process my data.  For 16S data, I typically use unweighted unifrac.  Were I to use weighted unifrac, I should first perform qPCR (same conditions/primers as used to produce sequences), then modify the OTU table to express properly weighted values by taking the percentage of each OTU per sample and multiplying it by the copy number obtained by qPCR.  This is the only time I have seen weighted unifrac make sense.

For fungai ITS data I take a similar approach.  I prefer the use of binary jaccard for initial data as it is presence/absence only and bray-curtis weights according to abundance as I understand it.  One I have performed qPCR, I will then entertain the differences given by weighted jaccard or bray-curtis.

Andy

Andrew Krohn

unread,
Feb 24, 2014, 1:08:37 PM2/24/14
to qiime...@googlegroups.com
Now I am wondering if I commented to hastily.  It seems to me that qPCR is useful in obtaining quantitative results, but that it is unnecessary for weighted metrics as long as your script is adjusting the OTU table to relative abundance first.  I think the addition of qPCR data would strengthen such results, but not be required.  I was working on some data this weekend with known differences.  These data have very low per-sample counts (rarefied to 100 OTUs).  The differences known to exist only surface in my NMS plots when the weighted metrics are used.

Greg Caporaso

unread,
Feb 24, 2014, 4:15:32 PM2/24/14
to qiime...@googlegroups.com
Hey Andy,
One comment on this:


> it is unnecessary for weighted metrics as long as your script
> is adjusting the OTU table to relative abundance first.

It should also be fine if you've rarefied to even sampling depth first as well, which should always be the case when computing beta diversity, as the rarefied counts should have very high correlation with relative abundances (the only variation from perfect correlation should be due to sampling error in the rarefaction process). 

Greg

Andrew Krohn

unread,
Feb 24, 2014, 4:50:39 PM2/24/14
to qiime...@googlegroups.com
Thanks Greg.  Really appreciate the explanation.  

For qPCR-weighted data, one should be sure to use absolute abundances (after transformation with qPCR counts from relative abundance OTU table) as was done by Bokulich and Mills 2013 (AEM).
Reply all
Reply to author
Forward
0 new messages