I suspect the large MAD values etc. in that paper are more due retaining shaded moments in the data. I don't see how else a reasonable model like DISC can be (on average) 40% incorrect or more.
Sources like SolCast and the NSRDB are pretty good at squeezing down annual average biases. When we looked at an early version of the PSM (the model behind the NSRDB and did hourly comparisons at locations not available to NREL, we found low annual bias but
substantial scatter in the hourly errors. Two adjacent pixels had different error distributions. I'm sure the NSRDB has improved with time, but that points out the limitations of statistical validations - you know over time the errors average out, but in any
given moment at a selected location, the error can be substantial.
Cliff