Le mercredi 24 juin 2015 à 09:25 -0700, Matthieu a écrit :
> Thanks.
>
> The current version of the package now estimates models with
> instrumental variables (2SLS), high dimensional fixed effects, and
> white / clustered standard errors. This allows to estimate a large
> part of models used in applied economics research. Moreover, this
> function seems faster than Stata and R corresponding functions
> (respectively areg / lfe), in particular for models with one high
> dimensional fixed effect.
I'm not very familiar with these models, but that looks really nice.
Have you considered using the fit() function with a model type to be
more similar to GLM.jl?
> Two more points make this function differ from the lm function in
> GLM:
>
> 1. The regression result object is very light (basically the initial
> formula, a vector of coefficients, and a covariance matrix). In
> contrast, since the output of GLM contains the original dataframe,
> the converted matrix of regressors, the model response etc, the
> output from GLM can actually take much more space than the initial
> DataFrame.
> I have chosen to return a light object because it allows to estimate
> multiple models without requiring more RAM at every step. Methods
> such as predict and residual can be defined as long as the user
> provides a DataFrame
I agree that's likely a good idea. With data sources like databases, it
wouldn't make any sense to try saving all of the data with the model.
We could imagine adding an argument to keep a copy of the data, if it
turns out that's needed.
I think the only case where having the data in the model object is when
calling predict(). Maybe it would be possible to save just the name of
the data frame, and use it if it's in scope?
> 2. The function has an argument that allows to change the way errors
> are computed. In R, correct errors are generally estimated in a
> second step, through a different package like vcov, multiwayvcov.
> This strikes me as inefficient and counterintuitive.
>
> I've defined an abstract type AbstractVcov. Any user can define a new
> type (child of this abstract type), as long as he/she defines a
> method, vcov, that acts on a regressor matrix (X), a hat matrix (X'X
> in the simple case), and a vector of residuals. This seems enough to
> define a wide range of standard errors.
>
> I've only defined 3 types (simple, white, clustered).
> For instance, to estimate a model with white robust standard errors
> reg(formula, df, VceWhite())
>
> To estimate a model with clustered standard errors
> reg(formula, df, VceCluster(:clustervar))
Sounds cool. I had open an issue in GLM.jl about this:
https://github.com/JuliaStats/GLM.jl/issues/42
Do you have any ideas about how to handle bootstrap in the same
framework?
Regards