Chi square test in Julia?

1,716 views
Skip to first unread message

Arin Basu

unread,
Feb 8, 2015, 6:32:48 PM2/8/15
to julia...@googlegroups.com
Hi All,

Please pardon my ignorance, but how does one do chisquare test in Julia. Something like,

```

chisq.test(x, y = NULL, correct = TRUE,
           p = rep(1/length(x), length(x)), rescale.p = FALSE,
           simulate.p.value = FALSE, B = 2000)
```
in R

I could not find anything in the documentation. I must not have searched enough, what can it be?

Best,
Arin

Simon Byrne

unread,
Feb 10, 2015, 5:35:42 AM2/10/15
to julia...@googlegroups.com
Hi Arin,

It would appear that there isn't a chi square test at the moment. If you want to implement one yourself, you can compute the quantiles of a chi squared distribution using DIstributions.jl (if you do this, please consider submitting it to HypothesisTests.jl). Alternatively, you could call the R function via RCall.jl or Rif.jl.

SImon

Benjamin Deonovic

unread,
Mar 5, 2015, 9:32:32 AM3/5/15
to julia...@googlegroups.com
I implemented the chisquare test in julia. I made a pull request in the HypothesisTests package. It hasn't been pulled yet, but probably will be soon. 

Benjamin Deonovic

unread,
Mar 6, 2015, 8:58:20 AM3/6/15
to julia...@googlegroups.com

Joshua Duncan

unread,
Nov 10, 2015, 10:10:10 AM11/10/15
to julia-users
I found your Chi-Square test and am trying to use it.  It appears to work with one array but not with two.  My steps are below:

This works:
ChisqTest([1,2,3,4])

This doesn't:
ChisqTest([1,2,3,4],[1,2,2,4])

It errors with the following:
LoadError: MethodError: `ChisqTest` has no method matching ChisqTest(::Array{Int64,1}, ::Array{Int64,1})
Closest candidates are:
  ChisqTest{T<:Integer}(::AbstractArray{T<:Integer,1}, ::AbstractArray{T<:Integer,1}, !Matched::Tuple{UnitRange{T<:Integer},UnitRange{T<:Integer}})
  ChisqTest{T<:Integer}(::AbstractArray{T<:Integer,1}, ::AbstractArray{T<:Integer,1}, !Matched::T<:Integer)
  ChisqTest{T<:Integer,U<:AbstractFloat}(::AbstractArray{T<:Integer,1}, !Matched::Array{U<:AbstractFloat,1})
  ...
while loading In[118], in expression starting on line 1


I have checked the arrays to make sure they're AbstractArrays and the result is true.

Any advice would be helpful, I might just be implementing wrong.

cormu...@mac.com

unread,
Nov 10, 2015, 4:37:30 PM11/10/15
to julia-users
Try a matrix?

Benjamin Deonovic

unread,
Nov 10, 2015, 4:48:33 PM11/10/15
to julia-users
Hey Joshua,

Just saw your post. I will investigate into what the issue is. I wrote this quite a while ago when I was just learning Julia!

Benjamin Deonovic

unread,
Nov 10, 2015, 5:21:16 PM11/10/15
to julia-users
Okay I have figured out the issue. I will fix it so it works the way you expected it to work. Before the fix goes live though it should work to do:

ChisqTest([1,2,3,4],[1,2,2,4], 4)

*note the 4

The issue was when I submitted the code to HypothesisTests.jl the only way to create a contingency table between two vectors x and y was to also provide the levels that the categorical variables could take on. Afterwards I submited a version to StatsBase.jl for ``counts`` that had default values, but I forgot to update my code at HypothesisTests.jl. Sorry for the late response!


**Note the above will give you what is equivalent in R to:

> chisq.test(matrix(c(1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1),nrow = 4,ncol = 4))

   
Pearson's Chi-squared test

data:  matrix(c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1), nrow = 4,     ncol = 4)
X-squared = NaN, df = 9, p-value = NA

Warning message:
In chisq.test(matrix(c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,  :
  Chi-squared approximation may be incorrect



However you are probably interested in

> chisq.test(c(1,2,3,4),c(1,2,2,4))

   
Pearson's Chi-squared test

data:  c(1, 2, 3, 4) and c(1, 2, 2, 4)
X-squared = 8, df = 6, p-value = 0.2381


When I wrote up the function there wasn't a good equivalent of R's ``table`` function. I will try to flesh out this code so it works closer to R. In the meantime the best thing is to work with the full contingency table i.e.:

julia> ChisqTest([1 0 0; 0 1 0; 0 1 0; 0 0 1])
Pearson's Chi-square Test
-------------------------
Population details:
 parameter of interest: Multinomial Probabilities
 value under h_0: [0.0625,0.0625,0.0625,0.0625,0.125,0.125,0.125,0.125,0.0625,0.0625,0.0625,0.0625]
 point estimate: [0.25,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.25]
 95% confidence interval: [(0.0,1.0),(0.0,1.0),(0.0,1.0),(0.0,1.0),(0.0,1.0),(0.0,1.0),(0.0,1.0),(0.0,1.0),(0.0,1.0),(0.0,1.0),(0.0,1.0),(0.0,1.0)]
 
Test summary:
 outcome with 95% confidence: fail to reject h_0
 two-sided p-value: 0.23810330555354436 (not significant)
 
Details:
 Sample size: 4
 statistic: 8.0
 degrees of freedom: 6
 residuals: [1.5,-0.5,-0.5,-0.5,-0.7071067811865475,0.7071067811865475,0.7071067811865475,-0.7071067811865475,-0.5,-0.5,-0.5,1.5]
 std. residuals: [2.0,-0.6666666666666666,-0.6666666666666666,-0.6666666666666666,-1.1547005383792517,1.1547005383792517,1.1547005383792517,-1.1547005383792517,-0.6666666666666666,-0.666666
6666666666,-0.6666666666666666,2.0]
 



Benjamin Deonovic

unread,
Nov 10, 2015, 5:32:27 PM11/10/15
to julia-users
Looks like the state of Julia's "Factors" (PooledDataArrays) is still quite a fast evolving monster. Because of this there hasn't been a proper implementation of R's "table" function. For now, if you want to run ChisqTest in Julia on two vectors do it on the matrix you would obtain in R by running table(x,y).

Joshua Duncan

unread,
Nov 12, 2015, 9:40:53 AM11/12/15
to julia-users
Thanks Benjamin! Creating a matrix out of my two arrays works just fine.  I'll go that way for now.

Josh
Reply all
Reply to author
Forward
0 new messages