Hi folks,
Hi folks,
- I have a million or so data points that appear to follow an exponential trend
- Is there a way for statsmodels to decide what is the best fit/correct estimation?
- The data is a mix between integers, real numbers, and categorical data (20 or so rows).
- The purpose of the analysis is to answer the following question:
- "What is the 'y' value of a given vector 'x' inputted by the user and what is the level of confidence or correctness of the answer?”
- I am not a 100% sure how to approach this problem
Thanks,A
On Wed, Jun 29, 2016 at 10:01 PM, Ahmed Dassouki <dass...@gmail.com> wrote:Hi folks,
- I have a million or so data points that appear to follow an exponential trend
If the variance looks stable, then you could just use a trend variable.The more common case, when variance also increases proportional to y, then an exponential or log-transformed y is usually used.either ols('np.log(y) ~ ...)or, often more appropriate using poisson.or GLM with family Poisson with cov_type='HC' to correct the standard errors for continuous data
I would say that the variance is not stable. `Item_Age` is one of the x variables and it seems that the `y` (item condition) is more accurate for newer items than it is for older items.
- Is there a way for statsmodels to decide what is the best fit/correct estimation?
That is difficult in the case when models are not nested an/or have different transformed endog y. The best to compare across models in the case when prediction is the target is to use a hold-out data set, similar to what's usually used in scikit-learn/machine learning. But statsmodels doesn't have automatic support.
- The data is a mix between integers, real numbers, and categorical data (20 or so rows).
Does this refer to explanatory variables x, and is the dependent variable y continuous?