# Issue with permutation test

3 views

### Christoph Ruehlemann

May 4, 2018, 1:15:04 AM5/4/18
Dear All

I'm interested in linguistic units occurring in two different conditions: (i) as utterances (or 'speaking turns') and (ii) as constructed dialog (or 'direct speech'). Here's a sample dataframe:
``````ALL <- data.frame(
Units = c("yeah", "mm", "no", "oh", "yes"),
FREQinCD = c(12, 1, 19, 13, 6)
FREQinUTT = c(352, 199, 122, 72, 70)
)  ``````

To establish whether the units occur more frequently in either condition I want to perform a permutation test. Here's the code:

``````n_cd <- sum(ALL\$FREQinCD) # Total number of tokens in constructed dialog in the actual (much bigger df): 1769
n_utt <- sum(ALL\$FREQinUTT) # Total number of tokens in utterances : 8064
for(i in 1:length(ALL10\$Units)) {
x_cd <- ALL10[i,2]  # Frequency of i-th unit in constructed dialog
x_utt <- ALL10[i,3] # Frequency of i-th unit in utterances
Occurrence_cd <- c(rep(1, x_cd), rep(0, n_cd - x_cd))  # Permutation for constructed dialog
Occurrence_utt <- c(rep(1, x_utt), rep(0, n_utt - x_utt)) # Permutation for utterances
p <- perm.test(Occurrence_cd, Occurrence_utt, conf.level=0.95, exact=TRUE,conf.int=TRUE)
if(i==1) print(c("Word","Freq_cd","Freq_utt","CI_lower","CI_upper","P\$perm"))
print(c(ALL10\$Units[i], x_cd, x_utt, round(p\$conf[1:2],5), round(p\$p.value,8)))
}# Total number of tokens in constructed dialog: 1769``````

The code, however, must be somewhat faulty: the execution takes ages and, what is more, confidence intervals are invariably NA and p-values are 0. Where's the mistake?

Chris

### Christoph Ruehlemann

May 6, 2018, 6:57:44 AM5/6/18
To make the sample reproducible, I should note that the perm.test() is part of the package exactRankTests. So here's the full code:

``````# install package 'exactRankTests' for perm.test:
library(exactRankTests) # data:``````
``````ALL <- data.frame(
Units = c("yeah", "mm", "no", "oh", "yes"),
FREQinCD = c(12, 1, 19, 13, 6)
FREQinUTT = c(352, 199, 122, 72, 70)
)  ``````
``# Total number of tokens:n_cd <- sum(ALL\$FREQinCD) # Total number of tokens in constructed dialog in the actual (much bigger df): 1769``
``n_utt <- sum(ALL\$FREQinUTT) # Total number of tokens in utterances : 8064``
``# run perm.test:``

### ludovic de cuypere

May 10, 2018, 5:11:07 AM5/10/18

Dear Chris

My guess would be to change exact=TRUE to FALSE to obtain an approximation of the p-value. I believe exact=TRUE considers all permutations (which can be computationally expensive), while FALSE randomly sampled permutations.

Best

Ludovic

Van: 'Christoph Ruehlemann' via CorpLing with R <corplin...@googlegroups.com>
Verzonden: zondag 6 mei 2018 12:57