Oh, nice. I should have searched. I'm sure there's a lot of
interesting stuff in there.
I just looked when pivoting was added to scipy. I (somewhat) recall it
being worked on around the same time as I was adding QR for ANOVA.
> Kevin is against adding it.
Bummer. I see the argument both ways I guess. Covers all of my things
to investigate list at least.
> I'm in general in favor of having an option to get something similar to R or Stata, but have not looked at the details yet.
I found the math here to be a succinct, helpful refresher, if interested.
https://johnwlambert.github.io/least-squares/
> Question is how to add the option, for sure not as default.
Yeah, I also added a "pqr" option to the linear models but it wasn't
the default.
I'd vote for keeping it in the linear models with NaNs if the added
derived results code complexity isn't too gross.
> AFAR, R also has a very low threshold of only 1e-7 for collinearity, which is quite different from our numpy rcond threshold.
Ah, that's a good point. It didn't appear so in lm_robust, as I was
trying to understand the differences. As far as I could tell it relies
on eigen's automagic, which I assume is similar to using the default
in np.linalg.matrix_rank. I stopped short of seeing what eigen was
doing that wasn't dgeqp3. I'll check this again though.
https://github.com/cran/estimatr/blob/1d642a4c1fb42730d663d519261bf38372542b7f/src/lm_robust_helper.cpp#L112
https://eigen.tuxfamily.org/dox/classEigen_1_1ColPivHouseholderQR.html#ae712cdc9f0e521cfc8061bee58ff55ee
Anyway, the ranks were the same. Just the values of R for the
collinear columns and subsequent ordering and permutations were
different so it dropped one of the other collinear columns. I can only
guess why one column may be dropped more than another without thinking
about it a bit more than I have.
Skipper