Standard CNN architectures expect a
fixed size input image - there is no requirement that it be square. For these networks one must either resize or crop images.
For large datasets, where images can be of varying aspect ratios (landscape or portrait), picking a square image seems reasonable. In other words, if you are to resize or crop out an image, and there is no specific reason to pick any other aspect ratio, then square is the one you go with. But I stress that other aspect ratios might work as well (or better) depending on the data and task at hand.
Regarding resizing vs. cropping...
Generally, if one has information about some level of consistency within a dataset (images of eyeballs of the same resolution), then it would make sense to crop images of the same size since this data 'registration' can only help the network achieve it's goal.
If on the other hand, you have a diverse dataset with 1000s of classes with images of arbitrary resolution and objects found anywhere in each image, then often it makes sense to resize all to the same size. The reason is that in order to crop, one would have to know what size to crop, and without more information this question is difficult to answer. But let's say you find a good crop size somehow, then the next question becomes, where in the image to crop? If you crop in the wrong place you could completely crop out the object that you are trying to classify. Another point, what to do if some of the images are smaller than the input size of the network - how would cropping work then? In any case, to get around all these issues (for specific datasets), this is why resizing is often chosen over cropping.
Yes, resizing will distort the images but they generally have been found to still work great as the representations learned by the CNN operate in this distorted space. It just works.
See this paper that resizes arbitrary words to a non-square, fixed input size:
Reading Text in the Wild with Convolutional Neural Networks