Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
How to calculate a matrix of pairwise counts from a 'long' data frame
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  5 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Iain Dillingham  
View profile  
 More options Nov 1 2012, 12:16 pm
From: Iain Dillingham <iain.dilling...@gmail.com>
Date: Thu, 1 Nov 2012 09:16:10 -0700 (PDT)
Local: Thurs, Nov 1 2012 12:16 pm
Subject: How to calculate a matrix of pairwise counts from a 'long' data frame

Hello everyone,

I have a 'long' data frame with id and featureCode columns. The featureCode
column contains values of a categorical variable; each record has between 1
and 9 of these. For example:

id  featureCode
5   PPLC
5   PCLI
6   PPLC
6   PCLI
7   PPL
7   PPLC
7   PCLI
8   PPLC
9   PPLC
10  PPLC

I'd like to calculate the number of times each feature code is used with
the other feature codes (the "pairwise counts" of the title). Ultimately,
the result would be a matrix. For example:

      PPLC  PCLI  PPL
PPLC  0     3     1
PCLI  3     0     1
PPL   1     1     0

However, I suspect to get this far I need to use plyr (or similar) to
produce an intermediate data frame in the form:

id  featureCode1  featureCode2
5   PPLC          PCLI
5   PCLI          PPLC

I scoured the web (and the ggplot2 book, which has a section on plyr) for
help and came up with the following:

my_func <- function(df)
{
  with(df, data.frame(
    for (i in 1:length(featureCode))
    {
      for (j in 1:length(featureCode))
      {
        if (i != j)
        {
          featureCode1 = featureCode[i]
          featureCode2 = featureCode[j]
        }
      }
    }
  ))

}

reports.pairs <- ddply(reports.long, .(id), my_func)

Not surprisingly, it doesn't work (I'm an R beginner and come from a Java
background). However, I include it here to give you an idea of what I'm
trying to do.

Could anyone suggest where I might be going wrong? Thanks in advance for
any help.

Iain


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Winston Chang  
View profile  
 More options Nov 1 2012, 12:38 pm
From: Winston Chang <winstoncha...@gmail.com>
Date: Thu, 1 Nov 2012 11:38:35 -0500
Local: Thurs, Nov 1 2012 12:38 pm
Subject: Re: How to calculate a matrix of pairwise counts from a 'long' data frame

You can convert it to wide format like this:

dat <- read.table(header=T, con <- textConnection('
  id  featureCode
  5   PPLC
  5   PCLI
  6   PPLC
  6   PCLI
  7   PPL
  7   PPLC
  7   PCLI
  8   PPLC
  9   PPLC
  10  PPLC'))
close(con)

# Convert to wide format
library(reshape2)
dat_wide <- dcast(dat, id ~ featureCode)
#   id PCLI  PPL PPLC
# 1  5 PCLI <NA> PPLC
# 2  6 PCLI <NA> PPLC
# 3  7 PCLI  PPL PPLC
# 4  8 <NA> <NA> PPLC
# 5  9 <NA> <NA> PPLC
# 6 10 <NA> <NA> PPLC

(see
http://wiki.stdout.org/rcookbook/Manipulating%20data/Converting%20bet...
)

After this stage, I'm not sure the best way to count up the pairings. I can
think of some not-very-elegant ways to do it, but maybe someone else will
have better ideas.

-Winston

On Thu, Nov 1, 2012 at 11:16 AM, Iain Dillingham
<iain.dilling...@gmail.com>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Meilstrup  
View profile  
 More options Nov 1 2012, 1:02 pm
From: Peter Meilstrup <peter.meilst...@gmail.com>
Date: Thu, 1 Nov 2012 10:02:04 -0700
Local: Thurs, Nov 1 2012 1:02 pm
Subject: Re: How to calculate a matrix of pairwise counts from a 'long' data frame
er, not sure how that got sent.  Try merging the data frame to itself

j <- merge(df, df, by="id")

then count cases:

c <- count(j, c("plcc.x", "plcc.y"))

then convert to matrix:

acast(c, plcc.x ~ plcc.y)

On Thu, Nov 1, 2012 at 9:55 AM, Peter Meilstrup


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Iain Dillingham  
View profile  
 More options Nov 2 2012, 8:46 am
From: Iain Dillingham <iain.dilling...@gmail.com>
Date: Fri, 2 Nov 2012 05:46:00 -0700 (PDT)
Subject: Re: How to calculate a matrix of pairwise counts from a 'long' data frame

Thanks for your help. Unfortunately the merge isn't quite what I'm looking
for, as it double counts categories. However, if you're interested I also
posted the question on stackoverflow<http://stackoverflow.com/questions/13176741/how-to-calculate-a-table-...>and received some useful advice.

Iain


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Meilstrup  
View profile  
 More options Nov 2 2012, 7:45 pm
From: Peter Meilstrup <peter.meilst...@gmail.com>
Date: Fri, 2 Nov 2012 16:44:54 -0700
Local: Fri, Nov 2 2012 7:44 pm
Subject: Re: How to calculate a matrix of pairwise counts from a 'long' data frame
Ah, I see. So, if you don't want to count a feature appearing with
itself you finish by setting the diagonal to zero.

m <- acast(c, plcc.x, plcc.y)
diag(m) <- 0

That seems to reproduce your example data?

On Fri, Nov 2, 2012 at 5:46 AM, Iain Dillingham


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »