Hi Nico,
The ablation in previous versions of irace had some problems under some scenarios.
1. It depends how you call the ablation command-line tool or R function. By default, it is the first configuration evaluated by irace. If there is no "default" or "initial" configurations given to irace, the first configuration is uniformly randomly generated. You can specify a different source.
2. Yes, but "better" on average (you are plotting mean cost) and with respect to the instances used for ablation (which instances are used is also an option of ablation). This definition of "better" may not be the same definition that irace is using for elimination.
3. Yes. Ablation may evaluate configurations that irace itself never explored, so in principle it could find a better configuration. Ablation may also evaluate instances that irace never evaluated and the instances have some hidden structure (like different classes), this may affect the results. Moreover, if you use test instances for ablation, and use the results of ablation to choose a configuration, then you are cheating (yourself) by using test instances for tuning the parameters. Finally, irace will re-evaluate the configurations on the instances during ablation. If the evaluations are very noisy, you may get slightly different results. These differences in means may be just noise when you look at the variance (boxplots or confidence intervals).
For the second plot:
1. I cannot tell you that. It depends on application what is the amount of difference that matters. But also, without looking at the variance, it would be difficult to say whether the difference is statistically significant at all (perhaps the default setting of ablation should show CIs...)
2. What the plot tells you is not that a component is bad, but that an individual change relative to the previous configuration improved or decreased the mean cost. It may happen that multiple parameter changes are required to get an improvement. Or it may happen that a parameter change looks bad, because of the values in the source configuration.
3. As discussed above, there are many reasons why the configuration returned by irace may seem worse (variance, different instances, etc) and also why it may actually worse (ablation explores configurations that irace doesn't). But if you are using the same training instances in irace and the ablation and the configuration returned by irace is actually worse than the first configuration evaluated, either you were very unlucky (irace is a stochastic heuristic after all) or there is something wrong with the irace setup (too little budget, the instances have some block structure but irace has not been told, you are looking at mean values but you used the f-test, there is something different between your irace setup and the ablation setup, etc.).
I hope the above helps!
Manuel.