Hmm, you are right. I suppose if you found the image was oriented
perfectly upright, and not at all perspective-distorted, you could
replace the expensive matrix transformation with a simple scale and
translate. I think it would be very rare in practice to encounter this
in a camera photo.
However if you are decoding a "perfect" image of a QR Code, like those
you might find on the web, of course you know it is perfectly upright
and not skewed. In that case we do have a decoder hint type called
"PURE_BARCODE" which signals to the decoder that it should assume the
image is a pure monochrome image, not rotated, not skewed, and of
course it can detect the QR Code much faster. I think that would be
our answer to your point.
You are definitely right -- it is possible that the top right and
bottom left finder patterns are not the most distant two points.
Imagine a photograph taken at an extremely flat angle along the
diagonal between the top right and bottom left finder pattern. But in
the normal case this heuristic works.
I do not know of a more reliable way to determine this except to
perhaps examine the image in complex ways. For example you should be
able to confirm your guess by seeing where black/white pixels seem to
occur in relation to these points. If you're right you should see
black/white pixels mostly out to the point where you'd guess the
fourth corner is. This is a good idea -- maybe too expensive relative
to the value it adds.
I would be very interested to hear the outcomes of your research. This
library is most weak in accounting for blur due to the camera focal
length or motion blur, and currently does not account for shadows (I
have experimented with this recently and have not found a cheap
improvement that improves performance substantially).