thename of the alternatives: if null, for awide data.frame, they are guessed from the variable names andthe choice variable (both should be the same), for a longdata.frame, they are guessed from the alt.var argument,
A mlogit.data object, which is a data.frame in longformat, i.e. one line for each alternative. It has a indexattribute, which is a data.frame that contains the index ofthe choice made (chid), the index of the alternative (alt)and, if any, the index of the individual (id) and of thealternative groups (group). The choice variable is a booleanwhich indicates the choice made. This function usestats::reshape() if the data.frame is in wide format.
I am reproducing some Stata code on R and I would like to perform a multinomial logistic regression with the mlogit function, from the package of the same name (I know that there is a multinom function in nnet but I don't want to use this one).
My problem is that, to use mlogit, I need my data to be formatted using mlogit.data and I can't figure out how to format it properly. Comparing my data to the data used in the examples in the documentation and in this question, I realize that it is not in the same form.
However, I only have one column (type) which shows the choice of the individual but does not show the other alternatives or the value of the other variables for each of these alternatives. When I try to apply mlogit, I have:
Edit: following the advices of @edsandorf, I modified my dataframe and mlogit.data works but now all the other explanatory variables have the same value for each alternative. Should I set these variables at 0 in the rows where the chosen alternative is 0 or FALSE ? (in fact, can somebody show me the procedure from where I am to the results of the mlogit because I don't get where I'm wrong for the estimation?)
Your data doesn't lend itself well to be estimated using an MNL model unless we make more assumptions. In general, since all your variables are individual specific and does not vary across alternatives (types), the model cannot be identified. All of your individual specific characteristics will drop out unless we treat them as alternative specific. By the sounds of it, each professional program carries meaning in an of itself. In that case, we could estimate the MNL model using constants only, where the constant captures everything about the program that makes an individual choose it.
Now we can run the model. I include the dummies for each of the alternatives keeping alternative 4 as my reference level. Only J-1 constants are identified, where J is the number of alternatives. In the second half of the formula (after the pipe operator), I make sure that I remove all alternative specific constants that the model would have created and I add your individual specific variables, treating them as alternative specific. Note that this only makes sense if your alternatives (programs) carry meaning and are not generic.
I have already loaded my excel file into the Rstudio Library, loaded the required packages and formed an mlogit.data set. However.. I am most likely not doing it right as I now keep getting errors. I have also attached a screenshot of the code I entered into Rstudio.
My question now is; looking at my data in excel, have I entered the right code into the mlogit.data function. And what do I enter into the mlogit function. Really hoping someone has experience with this mlogit package and could help me.
I am not familiar with the mlogit package and a quick look at the documentation did not make it clear to me everything you need to do. However, one problem is likely that in your call to mlogit() you have elements of the formula in single quotes. You should use the bare column names instead. The formula definitions in R generally take the bare column names.
Notice that I have added underscores to the column names that include spaces. Though it is possible to have column names with spaces, it is a source of needless trouble and you should just avoid it.
I hope that helps some.
3a8082e126