Mlogit R Package

0 views

Skip to first unread message

Najee Laboy

unread,

Aug 4, 2024, 8:07:19 PM8/4/24

to guetepobo

onlyrelevant if rpar is not NULL, if not NULL,halton sequence is used instead of pseudo-random numbers. Ifhalton = NA, some default values are used for the prime ofthe sequence (actually, the primes are used in order) and forthe number of elements droped. Otherwise, halton should be alist with elements prime (the primes used) and drop (thenumber of elements droped).

The data argument may be an ordinary data.frame. In this case,some supplementary arguments should be provided and are passed tomlogit.data(). Note that it is not necessary to indicate thechoice argument as it is deduced from the formula.

If heterosc=TRUE, the heteroscedastic logit model is estimated.J - 1 extra coefficients are estimated that represent the scaleparameter for J - 1 alternatives, the scale parameter for thereference alternative being normalized to 1. The probabilitiesdon't have a closed form, they are estimated using a gaussianquadrature method.

If rpar is not NULL, the random parameter model is estimated.The probabilities are approximated using simulations with R drawsand halton sequences are used if halton is notNULL. Pseudo-random numbers are drawns from a standard normal andthe relevant transformations are performed to obtain numbers drawnsfrom a normal, log-normal, censored-normal or uniformdistribution. If correlation = TRUE, the correlation between therandom parameters are taken into account by estimating thecomponents of the cholesky decomposition of the covariancematrix. With G random parameters, without correlation G standarddeviations are estimated, with correlation G * (G + 1) /2coefficients are estimated.

mlogit.data() to shape the data. nnet::multinom() frompackage nnet performs the estimation of the multinomial logitmodel with individual specific variables. mlogit.optim()details about the optimization function.

I have already loaded my excel file into the Rstudio Library, loaded the required packages and formed an mlogit.data set. However.. I am most likely not doing it right as I now keep getting errors. I have also attached a screenshot of the code I entered into Rstudio.

My question now is; looking at my data in excel, have I entered the right code into the mlogit.data function. And what do I enter into the mlogit function. Really hoping someone has experience with this mlogit package and could help me.

I am not familiar with the mlogit package and a quick look at the documentation did not make it clear to me everything you need to do. However, one problem is likely that in your call to mlogit() you have elements of the formula in single quotes. You should use the bare column names instead. The formula definitions in R generally take the bare column names.

Notice that I have added underscores to the column names that include spaces. Though it is possible to have column names with spaces, it is a source of needless trouble and you should just avoid it.

I hope that helps some.

Moreover, I'd like to set the alternative specific variables as I said before. Insert them in part 2 of mlogit formula takes me two parameter values, but I'd like to have just one parameter, for the mentioned alternative.

You cannot do what you want. It's not a question of mlogit in particular, it's a question of how multinomial logistic regression works. If your dependent variable has 3 levels, you will have 2 intercepts. And you have to use the same independent variables for the whole model (that's true for all methods of regression).

However, referring to the second part of the question (" individual alternative specific variables (ex: 0(if no)/1(if yes) home-destination trip, just for walk mode") I tried to modify the dataset by inserting 3 columns (dhome.auto [all zeros], dhome.transit [all zeros] and dhome.walk [0 if no / 1 if yes it's a home-destination trip]) in order to obtain this variable effective just for walk mode, even if it's now traited as an alternative specific variable. Then

I am trying to use the mlogit package in R and have been following the vignette trying to figure out how to get the marginal effects for my data. The example provided uses continuous variables, but I am wondering how to do this with categorical explanatory variables.

I have a value of risk which is continuous as a covariate, but I also have age, class, and gender as covariates. I want to see the marginal effects of "females" only or of "Young - females" in regard to risk. How would I do this?

I'm not sure how to manipulate the z data frame to get the mean risk for females or young females to then be able to calculate the marginal effects. Would I do them all separately? Do I somehow divide the data frame by age class (say I have just 2 age classes: young and old) so that I have 1 data frame for the young, and a separate new data frame for old, then calculate mean risk?

What I am hoping to get from my own data is to be able to interpret the magnitude of the likelihood in producing my categories of offspring. As an example, what I want to say is that if there is a 1 unit increase in risk, it is 10% more likely for older females to produce 2 offspring. As there is a 1 unit increase in risk, younger females are 15% more likely to produce 2 offspring.

I am not sure how to calculate the marginal effects by hand, and therefore am confused as to how to get a package to do it for me. Ive also been trying in the nnet library or the VGAM, but neither of these seem to give a great deal of help either.

I sort of got an answer - not sure if its the best, but it worked. My covariate that I was interested in just so happened to be 2 classes - which means I could turn the covariate into a binary 0,1 numeric response. Therefore, when I rerun the code, I could then calculate the mean for this "categorical" variable.

This page uses the following packages. Make sure that you can loadthem before trying to run the examples on this page. If you do not havea package installed, run: install.packages("packagename"), orif you see the version is out of date, run: update.packages().

Please note: The purpose of this page is to show how to use variousdata analysis commands. It does not cover all aspects of the research processwhich researchers are expected to do. In particular, it does not cover datacleaning and checking, verification of assumptions, model diagnostics orpotential follow-up analyses.

Example 2. A biologist may be interested in food choices that alligators make.Adult alligators might have different preferences from young ones.The outcome variable here will be the types of food, and the predictorvariables might be size of the alligators and other environmental variables.

Example 3. Entering high school students make program choices amonggeneral program, vocational program and academic program.Their choice might be modeled using their writing scoreand their social economic status.

First, we need to choose the level of our outcome that we wish to use as our baseline and specify this inthe relevel function. Then, we run our model using multinom.The multinom package does not include p-value calculation for the regressioncoefficients, so we calculate p-values using Wald tests (here z-tests).

The ratio of the probability of choosing one outcome category over theprobability of choosing the baseline category is often referred as relative risk(and it is sometimes referred to as odds, described in the regression parameters above). The relative risk is the right-hand side linear equation exponentiated, leading to the fact that the exponentiated regressioncoefficients are relative risk ratios for a unit change in the predictorvariable. We can exponentiate the coefficients from our model to see theserisk ratios.

You can also use predicted probabilities to help you understand the model.You can calculate predicted probabilities for each of our outcome levels using thefitted function. We can start by generating the predicted probabilitiesfor the observations in our dataset and viewing the first few rows

Next, if we want to examine the changes in predicted probability associatedwith one of our two variables, we can create small datasets varying one variablewhile holding the other constant. We will first do this holding write atits mean and examining the predicted probabilities for each level of ses.

Another way to understand the model using the predicted probabilities is tolook at the averaged predicted probabilities for different values of thecontinuous predictor variable write within each level of ses.

Sometimes, a couple of plots can convey a good deal amount of information.Using the predictions we generated for the pp.write object above, we can plot the predicted probabilities against the writing score by thelevel of ses for different levels of the outcome variable.

You appear to have hit upon an unlucky combination of optimization parameters, specifically, with respect to the Halton pseudo-random sequence, which may possibly have a bug in it. BFGS appears to be stopping prematurely with R = 300, but not with other significantly smaller or larger values. Fortunately, you don't need large values of R (or Halton at all) in this case.

On my initial run through your code, I got the same results as you did, with runtime statistics indicating that 8 iterations were required for BFGS to converge. I then changed R in the function call to equal 30:

Note that not only are the coefficient estimates correct, in the sense of being reasonably close to the actual values given their standard errors, the runtime is 40% that of the R = 300 case, although requiring 22 BFGS iterations instead of 8.

With R = 100 22 iterations were also needed, the runtime increased by a little over 30% to about half required in the initial case, and the coefficient estimates were essentially the same as in the R = 30 run:

Multiple other tests, combined with the OP's tests noted in comments, leads me to conclude that Halton doesn't work consistently well in the one-dimensional case, at least for this problem. In actual practice, where the true parameters are unavailable for comparison with the estimates, it would be necessary to try several different parameterizations of halton and R in the mlogit call, and check for consistency of the results (and the value of the log likelihood, I suspect). Avoiding Halton altogether and specifying an increasing sequence of R values for the random number generator until stable estimates are achieved is an alternative that would also likely be workable, runtime considerations aside.