PS in NV21, the first 2/3 of the image data is the Y plane. The U and V planes are 4x smaller than Y because they are 2x2 subsampled, though there are 2 planes.
Starting 1/3 of the way through the data happens to be right as it's halfway into the Y plane.
From there it kind of doesn't matter what you sample. Looks like you are sampling 1/3 of a row, though the comment mentions sampling a diagonal. It might make more sense to sample a whole row. But, it doesn't matter a lot. You could sample 1000 pixels in a row and just use that and would be pretty fine too. It's less principled, but simpler, and more cache-friendly.
I think there are a few more loose ends here like making sure successive frames don't turn it on and off quickly, or handling a race condition with the user turning it on manually.