I got a nan p value and want to figure out the reason. Is there a way to look at the source code of logistic regression?
--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/704a3e72-7cba-42ee-a007-9b0b7cd01852n%40googlegroups.com.
Thanks for the detailed response!! Two sets of data give me the nan p-value problem. one dataset is a matrix with all 0 but one 0.01, I guess for this it makes sense p value is nan(maybe very close to 1?) but I don't know why the LL likelyhood is ~-27. Is there any randomness involved in the implementation of logistic regression model here?
--On Wednesday, November 3, 2021 at 8:56:37 PM UTC-4 josefpktd wrote:On Wed, Nov 3, 2021 at 8:36 PM Jason Dou <dou...@gmail.com> wrote:I got a nan p value and want to figure out the reason. Is there a way to look at the source code of logistic regression?It's not so easy, because most of the code of Logit model is inherited..statsmodels.discrete.discrete_model.Logit itself defines loglike, score and hessian.Optimization and inference like p-values are inherited from computed in base.model.LikelihoodModel and LikelihoodModelResults.If you get nan standard error bse and p-value, then most likely you have a Hessian, cov_params that is not positive definite.Either the design matrix exog is (near) singular or there are some convergence failures.Model specific problems like perfect separation also cause problems for maximum likelihood estimation. We warn for cases that can be easily identified but there can be problems with the data/model that are more difficult to identify.If some values of exog are large, then exp in logit might overflow, but that would most likely raise an exception or have nan everywhere.There is a lot of code involved in this. It's usually better to directly verify that your data, design matrix is well behaved (no singular/eigen values close to zero)..And then work backwards from bse and cov_params and check why that has a nan.Josef--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/704a3e72-7cba-42ee-a007-9b0b7cd01852n%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/1a6c41cb-fe76-4794-97e5-26f6590abf3fn%40googlegroups.com.
On Mon, Nov 8, 2021 at 10:22 AM Jason Dou <dou...@gmail.com> wrote:Thanks for the detailed response!! Two sets of data give me the nan p-value problem. one dataset is a matrix with all 0 but one 0.01, I guess for this it makes sense p value is nan(maybe very close to 1?) but I don't know why the LL likelyhood is ~-27. Is there any randomness involved in the implementation of logistic regression model here?The algorithm itself is deterministic, we don't add any random noise.However, there are two main sources why results can differ across runs and machines- convergence tolerance: by default we only get optimization at precision 1-5 to 1e-8. Different versions of the optimizer can produce results that differ in that range- floating point noise: all computations are only "deterministic" subject to floating point computation.In regular, well posed cases, this might only affect the last digits of the result, e.g. our unit test work in those cases with something like rtol=1e-13.In ill posed problems, the floating point noise might dominate. For example, if the smallest singular value = 1e-14, then some results often are only or mostly floating point noise, and precision might not be even 1 digit.Related problem that I guess you have, is that if a value is theoretically zero or very small, then even small floating point noise can make it negative which cause problems if that creates a "negative variance".The dataset you mention sounds ill posed. The information in the data is too small to avoid floating point noise around zero.In your case, ll might looks reasonable, because it uses computation that are less numerically fragile than for example the computation for standard errors bse.