I tried to implement the comparison of two machine learning algorithm (Linear regression vs. Random forest) by using 5*2 cv paired t-test. However, I got the result of t statistics = 0 and p-value = 1 in my code.
When I implement these two algorithm, the value for MSE is different with 100 epochs. I'm not sure why I have the result of t statistics = 0 and p-value = 1.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
import pickle
from mlxtend.evaluate import paired_ttest_5x2cv
with open('datamrosbmd1103_B1FND', 'rb') as file_handler:
data = pickle.load(file_handler)
X, Y = data.get('X', []).values, data.get('Y', []).values
linear = LinearRegression()
rf = RandomForestRegressor()
gb = GradientBoostingRegressor()
t, p = paired_ttest_5x2cv(estimator1=linear,
estimator2=rf,
X=X, y=Y,
random_seed=25)
print("t statistic: %.5f" % t)
print("p avlue: %.5f" % p)
After I ran these code, I got t statistics = 0 and p-value = 1. Could you help on this why I have such p-value = 0 ?
Thank you.