Calculating average plot similarity across sites

59 views
Skip to first unread message

John W.

unread,
Aug 17, 2015, 5:40:21 PM8/17/15
to Davis R Users' Group
I am using vegdist in the vegan package to calculate average pairwise Sorensen similarity values (1-Bray) for tree species in sample plots at two field sites. I have done the comparison for all plots within each site, but now I want to do it comparing the two sites (which also share species) and get an average (mean) value for the comparisons. Alas, I don't know how to set it up so that R only compares plots from site A and site B, and NOT within A and within B (this would change the average). My inclination is that I should: (1) create a csv file with a table where the sites are all in rows in the form of A1, A2, A3...A40, B1, B2, B3...B40, and the species are in columns, listed as 1...371 with 1 or 0 (presence/absence) in the corresponding cells in the body of the table; and then (2) specify that R only calculate A x B plots, or else exclude all A x A and B x B comparisons. Can anyone suggest how to do this, either along the lines I am thinking or otherwise? Thanks!

Michael Hannon

unread,
Aug 18, 2015, 7:13:02 PM8/18/15
to davi...@googlegroups.com
Hi, John. I notice that you've gotten no response so far to this
question. Let me suggest that it might help if you could send us a
tiny example (7 rows and 3 columns or whatever) of (a) the data
structure you have in R to begin with, and (b) the operations that
you'd ideally like to have performed on that data structure. My
instinct is that creating a CSV file from R is almost never the right
answer (except to send to a non-R-using colleague, etc.).

-- Mike

Steve Fick

unread,
Aug 18, 2015, 8:07:57 PM8/18/15
to davi...@googlegroups.com
Hey John,

...is it feasible to lump them all together, then pick out the comparisons you want afterwords? Here's an example...


#get community data
data(varespec)
v <- varespec

#randomly assign 'sites'
community.ids <- row.names(v) 
site <- sample(c('a','b'), length(community.ids), replace =TRUE)
names(site) <-  communities
site

# compute bray-curtis distance
bc <- as.matrix(vegdist(v))
# to remove redundant comparisons
bc[upper.tri(bc, diag = TRUE)] <- NA 

# 'melt' distance matrix into data.frame of pairs
library(reshape)
bcl <- melt(bc)
bcl <- na.omit(bcl)
names(bcl) <- c('row', 'column', 'value')

# identfy pairs from same site 
bcl$rowSite <- site[ match( bcl$row, names(site)) ]
bcl$colSite <- site[ match( bcl$column, names(site)) ]
head(bcl)

same <- bcl$rowSite == bcl$colSite

different <- mean ( bcl[ -which(same) , 'value'  ] )
samesies <- mean( bcl[ which(same) ,  'value' ] )






--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at http://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Steve Fick

unread,
Aug 18, 2015, 8:22:52 PM8/18/15
to davi...@googlegroups.com
WUPS EDIT: 
line 8 : names(site) <- community.ids

John W.

unread,
Aug 19, 2015, 8:03:32 PM8/19/15
to Davis R Users' Group
Thanks Steve and Mike (and anyone else who wants to weigh in) for the posts. Steve's idea might work, but I'm still trying to figure out if I understand it. I have attached an mini example of my data file as a csv here. In it I have 6 different species as numbers in row one (column names), where columns 2 and 3 (species 1 & 2) are species shared at both sites, cols 4 and 5 are only found at site A, and cols 6 and 7 are only at site B. T the plots are labeled as row names in column one with five from site A and five from site B. What I want then, is to have R calculate the Sorensen similarity for all the A x B plot comparisons only--not the internal A x A or B x B comparisons (those I did using the code below (e.g. for A)). I could manually remove the internal comparisons, but I have 84 plots and 371 species, so it would be pretty tedious. Anyway, thanks for your thoughts!

A_plots <- read.csv ("C:/Users/Owner/Documents//R_Data/A_Plots_Sorensen.csv", header=T, row.names=1)
ADis <- vegdist(A_plots, method="bray", binary=T, diag=T)
ASor <- 1-ADis
ASor
ASor.ave <- mean(ASor)
ASor.ave
JW_Sample_Sorensen.csv

Jaime Ashander

unread,
Aug 19, 2015, 8:47:59 PM8/19/15
to davi...@googlegroups.com
John,

This might get you there. I think it's in the spirit of what Steve suggested: do all the comparisons, make the within-site comparisons null, then take the average ignoring the null values.

library(vegan)
read.csv("JW_Sample_Sorensen.csv") -> d
1 - vegdist(d[ ,-1], method="bray", binary=T, diag=T) -> d.dist
# within.site out within sites
plots.per.site <- 5
within.site.value <- NA
d.mat <- as.matrix(d.dist)
d.mat[ 1:plots.per.site, 1:plots.per.site] <- within.site.value
d.mat[ (plots.per.site + 1):(2*plots.per.site), (plots.per.site + 1):(2 * plots.per.site)] <- within.site.value
#if you have > 2 sites need to put the above in a loop
# basic strategy is to make all NA within the block diagonal matrix of sites
d.site.dist <- as.dist(d.mat)
# this gives the mean across all comparisons between of plots between the two
# sites. for > 2 sites geting site-site distances would be more complicated
mean(d.site.dist, na.rm=TRUE)
[1] 0.2855238


--
Reply all
Reply to author
Forward
0 new messages