On 20 November 2018, in "Variable impacts in symbolic regression" Gabriel said:
For the "Best training solution" we determine impacts not by counting variables but by some kind of sensitivity analysis. E.g. to determine the impact of x1 we set x1 to its median value and evaluate the model again. If the model error is increased significantly then this is an indication that x1 is important. If the model error is almost unchanged then this indicates that x1 is not important. HL has several different replacement strategies which are available for this sensitivity analysis (I prefer shuffling).
My questions are about "Variable impacts in symbolic classification".
I would like clarification on the following doubts:
- The indication that it is the median value is done in Replacement for numeric variables?
- You indicate that "if the model error is increased"
- what metric do you use as model error? Mean squared error?
- What is the relationship between the value shown for VariableImpacts and this variation?