You're correct it will be distorted (it will scale it without keeping the original proportionals). I was thinking since sampling is in the range of [0, 1] it shouldn't matter much, and some simple tests showed up ok, but I understand it having artifacts especially if the distortion is big.
The transform you're suggesting has to deal with the repeat mode too, because you don't want to sample the empty space right?
So it would look something like this (uvMax being the non-empty area)?
uv = mod(uv * uvMax, uvMax)
This adds a bit of complexity, because now I have to handle the repeat mode in my shader, and also if I don't care about mipmap filtering than I could just implement repeat mode in the shader similarly to above, without needing to convert the texture to POT ?