The big problem is that the margin of error becomes larger and larger. The more nets are tested, the larger the error.
This is because each net is only compared to the previous one, so the error is accumulated.
This can be avoided easily by using a better method like I proposed, by always comparing to some fixed net of known strength.
Surely almost any other AB engine does this to find stronger versions.
But for Leela Zero Go it works much better because of the gating, so much less nets are compared (only 1 testrun, only 189 nets in 12 months!).
But of course there is inaccuracy and "self-play elo inflation" of approx. factor 3.
@Trevor: Yes, surely this is the case with Leela Zero Go. You see more signal and less noise.
But my point is that we can easily do much better even than Leela Zero Go by using my method, which enables us to get approx. real Elo!
This is made possible just because we don't use gating, so no need to compare with previous net.
That is the point: LCZero abandoned gating but didn't change the method for estimating Elo.
Well, I should post my proposal in discord to the developer team. Surely they have known the problem all the time, but didn't fix it.
Probably because manpower was lacking to implement the change. I understand that. Well, probably I should offer to implement the change myself.
Being a software developer, that should be possible.