aregImpute subscript out of bounds

Jonathan Chipman

unread,

Apr 24, 2015, 5:03:08 PM4/24/15

to reg...@googlegroups.com

Hi Dr. Harrell,

I keep getting an aregImpute error: "Error in `[<-`(`*tmp*`, , (j + 1):(j + m), value = c(63.4333333333333, : subscript out of bounds". I'm not really sure what the cause of this error may be. Do you have any insights on what I should consider?

Thank you, Jonathan

Frank Harrell

unread,

Apr 25, 2015, 8:58:51 AM4/25/15

to reg...@googlegroups.com

Run traceback() any time you receive an error. To fix this I'll probably need a minimal example that fails to work from.

Jonathan Chipman

unread,

Apr 25, 2015, 11:03:14 AM4/25/15

to reg...@googlegroups.com

Thanks! Here's the traceback, which includes the code I ran. Sorry, I'm not sure I can create a short example to replicate the same error that I get from my data. That said, I realize I'm not giving you much to go on with this question and that there may not be an answer to the question ...

--See traceback() below for initial call--

Iteration 1 Iteration 2

Error in `[<-`(`*tmp*`, , (j + 1):(j + m), value = c(2.53333333333333, :

subscript out of bounds

In addition: There were 11 warnings (use warnings() to see them)

> traceback()

2: areg(X[s, ], xf[s, i], xtype = vtype[-i], ytype = ytype, nk = min(nk),

na.rm = FALSE, tolerance = tolerance)

1: aregImpute(~Outcome + Age + sd.pt.rr + xc.hr.resp + xc.hr.spo2 +

xc.resp.spo2 + mean.pt.rr + mean.hr + mean.resp + mean.spo2 +

sd.hr + sd.resp + sd.spo2 + nsbp + ndbp + nmbp + isbp + idbp +

imap + COSEn + RR + SD.RR + DFA + PAF + PNSR + PECT + LDS1 +

DS + BIN.1 + BIN.2 + BIN.3 + BIN.4 + BIN.5 + BIN.6 + BIN.7 +

BIN.8 + BIN.9 + BIN.10 + BIN.11 + BIN.12 + CO2 + CO2.tsl +

PHOSPHORUS + PARTIAL.THROMBOPLASTIN.TIME + GLUCOSE + CALCIUM +

BLOOD.UREA.NITROGEN + TOTAL.PROTEIN + TOTAL.BILIRUBIN + AST..GOT. +

CREATININE + ALBUMIN + ALKALINE.PHOSPHATASE + ALT..GPT. +

SODIUM + OXYGEN.SATURATION + POTASSIUM + PO2 + BICARBONATE +

PH.ARTERIAL + PCO2 + BASE.EXCESS + WHITE.BLOOD.CELL.COUNT +

MAGNESIUM + PLATELET.COUNT + HEMATOCRIT + HEMOGLOBIN + PROTIME +

PROTIME.INR + LACTIC.ACID + FIO2 + NEUTROPHILS.PERCENT +

TROPONIN.I + lab.grp1a.tsl + lab.grp1b.tsl + lab.grp2.tsl +

lab.grp3.tsl + lab.grp4.tsl + lab.grp5.tsl + lab.grp6.tsl +

PARTIAL.THROMBOPLASTIN.TIME.tsl + ALBUMIN.tsl + OXYGEN.SATURATION.tsl +

PROTIME.tsl + PROTIME.INR.tsl + LACTIC.ACID.tsl + FIO2.tsl +

NEUTROPHILS.PERCENT.tsl + TROPONIN.I.tsl, data = d2, n.impute = 100)

The 11 warnings are:

1: In rcspline.eval(z, knots = parms, nk = nk, inclx = TRUE) :

could not obtain 3 interior knots with default algorithm.

Used alternate algorithm to obtain 3 knots

Lucy D'Agostino

unread,

Apr 25, 2015, 11:29:25 AM4/25/15

to reg...@googlegroups.com

Hi Jonathan,

I think if you use the match="closest" option, you won't get the error any more.

Let me know if that works!

Lucy

Jonathan Chipman

unread,

Apr 25, 2015, 11:40:13 AM4/25/15

to reg...@googlegroups.com

Wow, thanks Lucy! I'll give that a try and let you know.

Jonathan Chipman

unread,

Apr 25, 2015, 12:02:40 PM4/25/15

to reg...@googlegroups.com

Bummer, I still get the error.

Lucy D'Agostino

unread,

Apr 25, 2015, 12:10:19 PM4/25/15

to reg...@googlegroups.com

what if you change it to "kclosest" this is a bit of trial and error - but I tried to make some data that has that error, and it doesn't happen in my case with kclosest.

Jonathan Chipman

unread,

Apr 25, 2015, 12:41:26 PM4/25/15

to reg...@googlegroups.com

Thanks. Unfortunately, that hasn't worked either. I also tried the predictive mean matching type (pmmtype) options.

Frank Harrell

unread,

Apr 25, 2015, 2:33:19 PM4/25/15

to reg...@googlegroups.com

It's probably a singularity due to one of the following:

a factor variable with one level that has a very low frequency
a numeric variable with excessive ties that is hard to model with a spline
too high default number of knots
too many variables

You can play with the global number of knots, trying to figure out which variable is offending and making it be modeled linearly using I(x), or remove the last 5 variables and retry, then keep removing blocks of 5 variables until it doesn't bomb then back up to find out which variable when removed causes it to not bomb. While doing all this use n.impute=1.

Jonathan Chipman

unread,

Apr 25, 2015, 3:06:30 PM4/25/15

to reg...@googlegroups.com

That's a nice strategy for solving the problem, thank you.

I believe I can rule out that the issue is neither (1) a factor with variable with an infrequent level nor (3) too many default knots. Once the offending covariate(s) is/are found, how would you consider moving forward?

1. If it's a single offending covariate, would you run aregImpute on all other variables and use transcan or a crude implementation for the single covariate?

2. If too many variables, would it be reasonable to run aregImpute on two different sets: (1) the outcome and ekg values and (2) the outcome and lab values? I know this is not ideal by not using all the data at the same time.

Thank you.

Frank Harrell

unread,

Apr 25, 2015, 6:07:50 PM4/25/15

to reg...@googlegroups.com

If it is a single offending variable try using I() to force it to be linear. Otherwise you may have to remove the variable.

If too many variables and it doesn't work to set knots to zero (to treat all as linear) then the approach of using two different sets may be OK but fit.mult.impute does not know how to put them together.

You might do a redundancy analysis up front to limit the number of variables.

Jonathan Chipman

unread,

Apr 25, 2015, 7:54:54 PM4/25/15

to reg...@googlegroups.com

Thank you!

Reply all

Reply to author

Forward