Dimension Mismatch problem when predicting from the regression with pooled variables

86 views
Skip to first unread message

Jessica Koh

unread,
May 31, 2016, 12:19:02 PM5/31/16
to julia-stats
Hi all,

I am having a problem when predicting from the regression with pooled variables. By pooled variables I mean the ones that are created from the pool() function. I pooled the variable by groups (36 total), so putting the pooled variable in the regression automatically runs regression with indicators for 36 groups (some will be dropped due to collinearity). My current code is something like below:

# Create pooled data array from group_index column 
sampledata[:group_pooled] = pool(sampledata[:group_index])

# Run regression 
IPW_treat_fml = Formula(:attr_treat, :group_pooled)
IPW_treat_reg = glm(IPW_treat_fml, sampledata, Normal(), IdentityLink())

# Predict
predict(IPW_treat_reg, sampledata)


However, the predict(IPW_treat_reg, sampledata) does not work and gives me an error saying "DimensionMismatch("second dimension of A, 36, does not match length of x, 35"). If I write predict(IPW_treat_reg), then the code works, but I need to put sampledata in the prediction function in order to see all the NA predictions as well. predict(IPW_treat_reg) drops all the NA results. 

Any help will be greatly appreciated! 





Jessica Koh

unread,
May 31, 2016, 5:01:52 PM5/31/16
to julia-stats
Okay so I temporarily created a solution for this.

I think predict(IPW_treat_reg, sampledata) does not work in this case, because as I said some pooled variables are dropped due to collinearity. predict(IPW_treat_reg) works, although it only shows prediction for the non-NA dependent variable values. I don't need predictions for NA dependent variable values, so I decided to do the following.  

sampledata[:predict] = 0.0   # Make sure that it is a float!

p_index = 1   # Index for the prediction values. Going to be increased in the loop.

for i in 1:length(sampledata[:attr_treat])
     if !isna(sampledata[i, :attr_treat])
        sampledata[i, :predict] = predict(IPW_treat_reg)[p_index]
        p_index = p_index + 1
      else
        sampledata[i, :predict] = NA
      end
end

Above code allows me to create a new column called "predict" in sampledata that shows NA for the NA dependent variable values and predicted values for non-NAs. 

Let me know if there is an easier way to do this!
Reply all
Reply to author
Forward
0 new messages