Hi all,
New to this group so don't hesitate to let me know if my question isn't very clear. I'm a new user of statsmodels and I'm loving it but I've run into the issue described in this StackOverflow question:
http://stackoverflow.com/questions/34035912/patsy-new-levels-in-categorical-fields-in-test-dataBasically, when doing cross validation for an OLS model, some categories may not be present in the subset of the data used to fit the model. The following error is then raised when trying to predict the outcome for an input that contains unexpected categories:
```
/anaconda/envs/house/lib/python3.5/site-packages/patsy/categorical.py in categorical_to_int(data, levels, NA_action, origin)
360 "observation with value %r does not match "
361 "any of the expected levels (expected: %s)"
--> 362 % (value, level_str), origin)
363 except TypeError:
364 raise PatsyError("Error converting data to categorical: "
PatsyError: Error converting data to categorical: observation with value 'Metal' does not match any of the expected levels (expected: ['ClyTile', 'CompShg', ..., 'WdShake', 'WdShngl'])
```
I guess one solution would be to drop all the observations that have values not present in the model. Is there a better way of dealing with this?
Thanks!
Sam