The old Google Groups will be going away soon, but your browser is incompatible with the new version.
How to calculate a matrix of pairwise counts from a 'long' data frame
 There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic. There was an error processing your request. Please try again. Standard view   View as tree
 5 messages

From:
To:
Cc:
Followup To:
Subject:
 Validation: For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon.

More options Nov 1 2012, 12:16 pm
From: Iain Dillingham <iain.dilling...@gmail.com>
Date: Thu, 1 Nov 2012 09:16:10 -0700 (PDT)
Local: Thurs, Nov 1 2012 12:16 pm
Subject: How to calculate a matrix of pairwise counts from a 'long' data frame

Hello everyone,

I have a 'long' data frame with id and featureCode columns. The featureCode
column contains values of a categorical variable; each record has between 1
and 9 of these. For example:

id  featureCode
5   PPLC
5   PCLI
6   PPLC
6   PCLI
7   PPL
7   PPLC
7   PCLI
8   PPLC
9   PPLC
10  PPLC

I'd like to calculate the number of times each feature code is used with
the other feature codes (the "pairwise counts" of the title). Ultimately,
the result would be a matrix. For example:

PPLC  PCLI  PPL
PPLC  0     3     1
PCLI  3     0     1
PPL   1     1     0

However, I suspect to get this far I need to use plyr (or similar) to
produce an intermediate data frame in the form:

id  featureCode1  featureCode2
5   PPLC          PCLI
5   PCLI          PPLC

I scoured the web (and the ggplot2 book, which has a section on plyr) for
help and came up with the following:

my_func <- function(df)
{
with(df, data.frame(
for (i in 1:length(featureCode))
{
for (j in 1:length(featureCode))
{
if (i != j)
{
featureCode1 = featureCode[i]
featureCode2 = featureCode[j]
}
}
}
))

}

reports.pairs <- ddply(reports.long, .(id), my_func)

Not surprisingly, it doesn't work (I'm an R beginner and come from a Java
background). However, I include it here to give you an idea of what I'm
trying to do.

Could anyone suggest where I might be going wrong? Thanks in advance for
any help.

Iain

To post a message you must first join this group.
You do not have the permission required to post.
More options Nov 1 2012, 12:38 pm
From: Winston Chang <winstoncha...@gmail.com>
Date: Thu, 1 Nov 2012 11:38:35 -0500
Local: Thurs, Nov 1 2012 12:38 pm
Subject: Re: How to calculate a matrix of pairwise counts from a 'long' data frame

You can convert it to wide format like this:

id  featureCode
5   PPLC
5   PCLI
6   PPLC
6   PCLI
7   PPL
7   PPLC
7   PCLI
8   PPLC
9   PPLC
10  PPLC'))
close(con)

# Convert to wide format
library(reshape2)
dat_wide <- dcast(dat, id ~ featureCode)
#   id PCLI  PPL PPLC
# 1  5 PCLI <NA> PPLC
# 2  6 PCLI <NA> PPLC
# 3  7 PCLI  PPL PPLC
# 4  8 <NA> <NA> PPLC
# 5  9 <NA> <NA> PPLC
# 6 10 <NA> <NA> PPLC

After this stage, I'm not sure the best way to count up the pairings. I can
think of some not-very-elegant ways to do it, but maybe someone else will
have better ideas.

-Winston

On Thu, Nov 1, 2012 at 11:16 AM, Iain Dillingham
<iain.dilling...@gmail.com>wrote:

To post a message you must first join this group.
You do not have the permission required to post.
More options Nov 1 2012, 1:02 pm
From: Peter Meilstrup <peter.meilst...@gmail.com>
Date: Thu, 1 Nov 2012 10:02:04 -0700
Local: Thurs, Nov 1 2012 1:02 pm
Subject: Re: How to calculate a matrix of pairwise counts from a 'long' data frame
er, not sure how that got sent.  Try merging the data frame to itself

j <- merge(df, df, by="id")

then count cases:

c <- count(j, c("plcc.x", "plcc.y"))

then convert to matrix:

acast(c, plcc.x ~ plcc.y)

On Thu, Nov 1, 2012 at 9:55 AM, Peter Meilstrup

To post a message you must first join this group.
You do not have the permission required to post.
More options Nov 2 2012, 8:46 am
From: Iain Dillingham <iain.dilling...@gmail.com>
Date: Fri, 2 Nov 2012 05:46:00 -0700 (PDT)
Subject: Re: How to calculate a matrix of pairwise counts from a 'long' data frame

Thanks for your help. Unfortunately the merge isn't quite what I'm looking
for, as it double counts categories. However, if you're interested I also

Iain

To post a message you must first join this group.
You do not have the permission required to post.
More options Nov 2 2012, 7:45 pm
From: Peter Meilstrup <peter.meilst...@gmail.com>
Date: Fri, 2 Nov 2012 16:44:54 -0700
Local: Fri, Nov 2 2012 7:44 pm
Subject: Re: How to calculate a matrix of pairwise counts from a 'long' data frame
Ah, I see. So, if you don't want to count a feature appearing with
itself you finish by setting the diagonal to zero.

m <- acast(c, plcc.x, plcc.y)
diag(m) <- 0

That seems to reproduce your example data?

On Fri, Nov 2, 2012 at 5:46 AM, Iain Dillingham