I have the following problem:
What I want is to perform the regression in a similiar manner, except I want the categorical information to include the category above and below it. e.g. I want to regress for categories (0,1,2),(1,2,3),(2,3,0) and (3,0,1) instead of 0,1,2,3. The reason for doing this is a lack of datapoints in individual categoricals.
Is this a clean way of doing this in python?
[ 0., 1., 1., 1.],
[ 0., 0., 1., 1.],
[ 1., 0., 0., 1.]])
formula = 'var1 ~ C(cat,a):var2 + C(cat,a):I(var2**2)'
I'm sorry if its a little vague, I'll try to clarify. In my current dataset the number of datapoints in each category isnt always large enough to perform a proper regression (due to outliers). Unlike this snippet my real dataset consists of 16 categories. I want to create a different regression line/model for each category (different params for each category). In order to have enough datapoints for say.. category 7 I want to perform the regression using the datapoints from categories 6,7 and 8 (the categories above, below and including category 7).Are you saying I should do the following (on my example dataset)?a = np.array([[ 1., 1., 1., 0.],[ 0., 1., 1., 1.],
[ 0., 0., 1., 1.],
[ 1., 0., 0., 1.]])
formula = 'var1 ~ C(cat,a):var2 + C(cat,a):I(var2**2)'