Hi,
I have used quantile regression fairly extensively in a couple GBM implementations, including H2O. It is not uncommon to find quantile ranges "out of order" since they are solved independently. I usually impose a sort on the final predictions, as I expect some degree of this behavior. Further, I would expect similar behavior out of any method (GBM or other) that solves quantiles independently.
What you are experiencing does seem a bit extreme though. If I'm understanding this correctly, 5.7% of the predictions from the alpha 0.4 model are either lower than the prediction for the same row at alpha 0.05 or higher than the prediction for the same row at alpha 0.95. Strictly speaking, the 0.05 model's predictions ought to produce a number that is lower than 95% of your targets, 0.4 / 60%, and 0.95 5%. Can you measure whether these are the case for your (1) train predictions and (2) test predictions?
Trying more precise GBMs (more trees, lower learning rates) may help. At a high learning rate like the default, I have run into 20/50/80 quantiles being out of order at a similar rate as you have experienced. Those got a little better after reducing the learning rate. Also, that particular model had few features and mainly categoricals, so intuitively it was a fairly fragile set of models, which can cause the independent fits to be highly variable.
Internally, we can double-check our test cases for quantile regression and ensure it is operating the way we intend.
Thanks,
Mark Landry
Data scientist, H2O