CatBoostRegressor - Number of columns is different with number of features (in clickhouse, but not in python)

72 views
Skip to first unread message

kriticar

unread,
Oct 11, 2018, 4:10:55 AM10/11/18
to ClickHouse
Hi,

I am experimenting with catboost. I have defined model in following way:
model = CatBoostRegressor(iterations=2, learning_rate=1, depth=2)

Model is trained on boston housing dataset (from sklearn.datasets import load_boston).

There are 13 features on which the model is trained:

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
(404, 13) (102, 13) (404,) (102,)

model.fit(X_train, y_train, plot=True)

X_test.head()


I have saved the model (model.save_model('boston.cbm')) and imported it in clickhouse.

When I try to do the evaluation with the same numbers as the first row of X_test (id=173) with:

SELECT
    modelEvaluate('boston', 0.09178, 0.0, 4.05,    0.0, 0.510, 6.416, 84.1, 2.6463, 5.0, 296.0, 16.6, 395.50, 9.04) AS prediction

I get the following error:

SQL Error [36]: ClickHouse exception, code: 36, host: 192.168.112.38, port: 8123; Code: 36, e.displayText() = DB::Exception: Number of columns is different with number of features: 13 vs 8 + 0, e.what() = DB::Exception

  ClickHouse exception, code: 36, host: 192.168.112.38, port: 8123; Code: 36, e.displayText() = DB::Exception: Number of columns is different with number of features: 13 vs 8 + 0, e.what() = DB::Exception

  ClickHouse exception, code: 36, host: 192.168.112.38, port: 8123; Code: 36, e.displayText() = DB::Exception: Number of columns is different with number of features: 13 vs 8 + 0, e.what() = DB::Exception

    java.lang.Throwable: Code: 36, e.displayText() = DB::Exception: Number of columns is different with number of features: 13 vs 8 + 0, e.what() = DB::Exception

    Code: 36, e.displayText() = DB::Exception: Number of columns is different with number of features: 13 vs 8 + 0, e.what() = DB::Exception


I tried to delete last 5 numbers from the select in order to call modelEvaluate with 8 numbers:

SELECT
    modelEvaluate('boston', 0.09178,     0.0,     4.05,     0.0,     0.510,     6.416,     84.1,     2.6463) AS prediction

and now select returns prediction as 25.618446180923726


model.get_feature_importance() shows:

[21.83750733386891,
 0.0,
 0.0,
 0.0,
 0.0,
 77.71458264586703,
 0.0,
 0.44791002026405957]

(len is 8)

In python I would predict with all 13 features:

X_test.iloc[0]

CRIM         0.09178
ZN           0.00000
INDUS        4.05000
CHAS         0.00000
NOX          0.51000
RM           6.41600
AGE         84.10000
DIS          2.64630
RAD          5.00000
TAX        296.00000
PTRATIO     16.60000
B          395.50000
LSTAT        9.04000
Name: 173, dtype: float64

model.predict([X_test.iloc[0]])

array([25.61844618])

How come that I have trained the model on 13 features and in clickhouse I can use only 8 features?

Can someone please help?

Regards.

kriticar

unread,
Oct 15, 2018, 3:19:34 AM10/15/18
to ClickHouse
I don't know why and how, but suddenly model.get_feature_importance() is showing all 13 features, and now everything works.
I don't know what has happened. I restarted a computer. Maybe that is a reason.

[0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 43.94971980626672,
 0.0,
 26.557936918362053,
 1.3307625125338811,
 0.0,
 0.0,
 0.0,
 28.161580762837353]


Reply all
Reply to author
Forward
0 new messages