aregImpute subscript out of bounds

358 views
Skip to first unread message

Jonathan Chipman

unread,
Apr 24, 2015, 5:03:08 PM4/24/15
to reg...@googlegroups.com
Hi Dr. Harrell,

I keep getting an aregImpute error: "Error in `[<-`(`*tmp*`, , (j + 1):(j + m), value = c(63.4333333333333,  : subscript out of bounds".  I'm not really sure what the cause of this error may be.  Do you have any insights on what I should consider?

Thank you, Jonathan

Frank Harrell

unread,
Apr 25, 2015, 8:58:51 AM4/25/15
to reg...@googlegroups.com
Run traceback() any time you receive an error.  To fix this I'll probably need a minimal example that fails to work from.

Jonathan Chipman

unread,
Apr 25, 2015, 11:03:14 AM4/25/15
to reg...@googlegroups.com
Thanks!  Here's the traceback, which includes the code I ran.  Sorry, I'm not sure I can create a short example to replicate the same error that I get from my data.  That said, I realize I'm not giving you much to go on with this question and that there may not be an answer to the question ... 

--See traceback() below for initial call--
Iteration 1 Iteration 2 
Error in `[<-`(`*tmp*`, , (j + 1):(j + m), value = c(2.53333333333333,  : 
  subscript out of bounds
In addition: There were 11 warnings (use warnings() to see them)
> traceback()
2: areg(X[s, ], xf[s, i], xtype = vtype[-i], ytype = ytype, nk = min(nk), 
       na.rm = FALSE, tolerance = tolerance)
1: aregImpute(~Outcome + Age + sd.pt.rr + xc.hr.resp + xc.hr.spo2 + 
       xc.resp.spo2 + mean.pt.rr + mean.hr + mean.resp + mean.spo2 + 
       sd.hr + sd.resp + sd.spo2 + nsbp + ndbp + nmbp + isbp + idbp + 
       imap + COSEn + RR + SD.RR + DFA + PAF + PNSR + PECT + LDS1 + 
       DS + BIN.1 + BIN.2 + BIN.3 + BIN.4 + BIN.5 + BIN.6 + BIN.7 + 
       BIN.8 + BIN.9 + BIN.10 + BIN.11 + BIN.12 + CO2 + CO2.tsl + 
       PHOSPHORUS + PARTIAL.THROMBOPLASTIN.TIME + GLUCOSE + CALCIUM + 
       BLOOD.UREA.NITROGEN + TOTAL.PROTEIN + TOTAL.BILIRUBIN + AST..GOT. + 
       CREATININE + ALBUMIN + ALKALINE.PHOSPHATASE + ALT..GPT. + 
       SODIUM + OXYGEN.SATURATION + POTASSIUM + PO2 + BICARBONATE + 
       PH.ARTERIAL + PCO2 + BASE.EXCESS + WHITE.BLOOD.CELL.COUNT + 
       MAGNESIUM + PLATELET.COUNT + HEMATOCRIT + HEMOGLOBIN + PROTIME + 
       PROTIME.INR + LACTIC.ACID + FIO2 + NEUTROPHILS.PERCENT + 
       TROPONIN.I + lab.grp1a.tsl + lab.grp1b.tsl + lab.grp2.tsl + 
       lab.grp3.tsl + lab.grp4.tsl + lab.grp5.tsl + lab.grp6.tsl + 
       PARTIAL.THROMBOPLASTIN.TIME.tsl + ALBUMIN.tsl + OXYGEN.SATURATION.tsl + 
       PROTIME.tsl + PROTIME.INR.tsl + LACTIC.ACID.tsl + FIO2.tsl + 
       NEUTROPHILS.PERCENT.tsl + TROPONIN.I.tsl, data = d2, n.impute = 100)

The 11 warnings are:
1: In rcspline.eval(z, knots = parms, nk = nk, inclx = TRUE) :
  could not obtain 3 interior knots with default algorithm.
 Used alternate algorithm to obtain 3 knots

Lucy D'Agostino

unread,
Apr 25, 2015, 11:29:25 AM4/25/15
to reg...@googlegroups.com
Hi Jonathan,

I think if you use the match="closest" option, you won't get the error any more. 

Let me know if that works!

Lucy

Jonathan Chipman

unread,
Apr 25, 2015, 11:40:13 AM4/25/15
to reg...@googlegroups.com
Wow, thanks Lucy!  I'll give that a try and let you know.

Jonathan Chipman

unread,
Apr 25, 2015, 12:02:40 PM4/25/15
to reg...@googlegroups.com
Bummer, I still get the error.

Lucy D'Agostino

unread,
Apr 25, 2015, 12:10:19 PM4/25/15
to reg...@googlegroups.com
what if you change it to "kclosest" this is a bit of trial and error - but I tried to make some data that has that error, and it doesn't happen in my case with kclosest.

Jonathan Chipman

unread,
Apr 25, 2015, 12:41:26 PM4/25/15
to reg...@googlegroups.com
Thanks.  Unfortunately, that hasn't worked either.  I also tried the predictive mean matching type (pmmtype) options.

Frank Harrell

unread,
Apr 25, 2015, 2:33:19 PM4/25/15
to reg...@googlegroups.com
It's probably a singularity due to one of the following:
  1. a factor variable with one level that has a very low frequency
  2. a numeric variable with excessive ties that is hard to model with a spline
  3. too high default number of knots
  4. too many variables
You can play with the global number of knots, trying to figure out which variable is offending and making it be modeled linearly using I(x), or remove the last 5 variables and retry, then keep removing blocks of 5 variables until it doesn't bomb then back up to find out which variable when removed causes it to not bomb.  While doing all this use n.impute=1.

Jonathan Chipman

unread,
Apr 25, 2015, 3:06:30 PM4/25/15
to reg...@googlegroups.com
That's a nice strategy for solving the problem, thank you.

I believe I can rule out that the issue is neither (1) a factor with variable with an infrequent level nor (3) too many default knots.  Once the offending covariate(s) is/are found, how would you consider moving forward?

1. If it's a single offending covariate, would you run aregImpute on all other variables and use transcan or a crude implementation for the single covariate?
2. If too many variables, would it be reasonable to run aregImpute on two different sets: (1) the outcome and ekg values and (2) the outcome and lab values?  I know this is not ideal by not using all the data at the same time.

Thank you.

Frank Harrell

unread,
Apr 25, 2015, 6:07:50 PM4/25/15
to reg...@googlegroups.com
If it is a single offending variable try using I() to force it to be linear.  Otherwise you may have to remove the variable.

If too many variables and it doesn't work to set knots to zero (to treat all as linear) then the approach of using two different sets may be OK but fit.mult.impute does not know how to put them together.

You might do a redundancy analysis up front to limit the number of variables.

Jonathan Chipman

unread,
Apr 25, 2015, 7:54:54 PM4/25/15
to reg...@googlegroups.com
Thank you!
Reply all
Reply to author
Forward
0 new messages