CCA of barcoding data - OTUs distributed largely in regular geometric patterns such as triangles - any experience? Comments?

46 views
Skip to first unread message

Susanne Schmidt

unread,
Sep 21, 2015, 3:46:08 PM9/21/15
to GUSTA ME User Forum
When performing a CCA of barcoding data with R a colleague and I recently ended up with plots where the OTUs were distributed largely in regular geometric patterns such as (striated) triangles. And that's after the worst collinearity was gotten rid of. I am new to barcoding data - never observed such strong patterns when just analyzing the handful morphologically identified taxa but maybe that's just because there were so few 'dots'?
Does anyone have any experience with barcoding OTUs in CCA? Haven't found anything published yet?! Comments?

Pier Luigi Buttigieg

unread,
Sep 21, 2015, 3:52:44 PM9/21/15
to Susanne Schmidt, GUSTA ME User Forum
Thanks for the post! Could you describe the dimensions of your data sets and any transformation of the data prior to analysis? Also, are you convinced that you have sampled far enough along the gradients of interest to expect unimodal distributions of each OTU? Are the explanatory variables (or the [constrained] OTU counts) strongly anti-correlated? If so, they may 'force' sites as far away from each other as possible in low dimension, creating what appear to be regular spacings.

On 20 September 2015 at 17:06, Susanne Schmidt <susanne...@googlemail.com> wrote:
When performing a CCA of barcoding data with R a colleague and I recently ended up with plots where the OTUs were distributed largely in regular geometric patterns such as (striated) triangles. And that's after the worst collinearity was gotten rid of. I am new to barcoding data - never observed such strong patterns when just analyzing the handful morphologically identified taxa but maybe that's just because there were so few 'dots'?
Does anyone have any experience with barcoding OTUs in CCA? Haven't found anything published yet?! Comments?

--
You received this message because you are subscribed to the Google Groups "GUSTA ME User Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microb3-gusta...@googlegroups.com.
To post to this group, send an email to microb3...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/microb3-gustame/af6868ee-9465-4f51-ba69-733345f842e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Susanne Schmidt

unread,
Oct 1, 2015, 8:42:00 AM10/1/15
to GUSTA ME User Forum, susanne...@googlemail.com
sorry for the delay with this clarification..
The dimensions are 28 observations of 2132 variables for the OTUs - I am not totally sure whether CCA is still a valuable method for such data set ?!
The OTUs are presence/absence transformed since the actual reading of the machine is biologically not 100% meaningful, I am told. So that means that many OTUs show exactly the same pattern for the 28 observations, and/or show the same pattern as other OTUs ... should I somehow filter to just a meaningful data set? is there a simple command or should I rather script something? I couldn't find the notion "anti-correlated" on Gusta-me, never came across it before, but I guess I can make sense of it. How would I test that - is there some command already? Does one of the walk-throughs show a relevant procedure - did I miss that?

Environmental measurements (28 observations of 11 features) are normalized. 
The "worst" plots happen though when not the full set of environmental features is used, but a selection (either based on the strict research question, i.e. treatment, or based on e.g. step-wise regression - whether the latter is a valid procedure is another discussion I guess ..). So treatment of course is a factor, in this case with two levels - versus >2000 OTUs .. 

Most of the environmental features do cover enough of the gradient of interest for unimodal distributions to be expected - but since some of them are so strongly correlated, it makes sense to only use a subset - well - same as for the OTUs of course .. but how??

Or should I use an alternative method altogether, or approach in a completely different way? Sorry if this is a basic question and covered in any second bioinformatics handbook..?..
Reply all
Reply to author
Forward
0 new messages