Hi,
I am experimenting with catboost. I have defined model in following way:
model = CatBoostRegressor(iterations=2, learning_rate=1, depth=2)
Model is trained on boston housing dataset (from sklearn.datasets import load_boston).
There are 13 features on which the model is trained:
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
(404, 13) (102, 13) (404,) (102,)
model.fit(X_train, y_train, plot=True)
X_test.head()
I have saved the model (model.save_model('boston.cbm')) and imported it in clickhouse.
When I try to do the evaluation with the same numbers as the first row of X_test (id=173) with:
SELECT
modelEvaluate('boston', 0.09178, 0.0, 4.05, 0.0, 0.510, 6.416, 84.1, 2.6463, 5.0, 296.0, 16.6, 395.50, 9.04) AS prediction
I get the following error:
SQL Error [36]: ClickHouse exception, code: 36, host: 192.168.112.38, port: 8123; Code: 36, e.displayText() = DB::Exception: Number of columns is different with number of features: 13 vs 8 + 0, e.what() = DB::Exception
ClickHouse exception, code: 36, host: 192.168.112.38, port: 8123; Code: 36, e.displayText() = DB::Exception: Number of columns is different with number of features: 13 vs 8 + 0, e.what() = DB::Exception
ClickHouse exception, code: 36, host: 192.168.112.38, port: 8123; Code: 36, e.displayText() = DB::Exception: Number of columns is different with number of features: 13 vs 8 + 0, e.what() = DB::Exception
java.lang.Throwable: Code: 36, e.displayText() = DB::Exception: Number of columns is different with number of features: 13 vs 8 + 0, e.what() = DB::Exception
Code: 36, e.displayText() = DB::Exception: Number of columns is different with number of features: 13 vs 8 + 0, e.what() = DB::Exception
I tried to delete last 5 numbers from the select in order to call modelEvaluate with 8 numbers:
SELECT
modelEvaluate('boston', 0.09178, 0.0, 4.05, 0.0, 0.510, 6.416, 84.1, 2.6463) AS prediction
and now select returns prediction as 25.618446180923726
model.get_feature_importance() shows:
[21.83750733386891,
0.0,
0.0,
0.0,
0.0,
77.71458264586703,
0.0,
0.44791002026405957]
(len is 8)
In python I would predict with all 13 features:
X_test.iloc[0]
CRIM 0.09178
ZN 0.00000
INDUS 4.05000
CHAS 0.00000
NOX 0.51000
RM 6.41600
AGE 84.10000
DIS 2.64630
RAD 5.00000
TAX 296.00000
PTRATIO 16.60000
B 395.50000
LSTAT 9.04000
Name: 173, dtype: float64
model.predict([X_test.iloc[0]])
array([25.61844618])
How come that I have trained the model on 13 features and in clickhouse I can use only 8 features?
Can someone please help?
Regards.