I'm trying to use the excellent open source implementation of RetinaNet with keras (https://github.com/fizyr/keras-retinanet) and I've noticed a little detail that I would like to clarify. In the preprocessing stage, image is read (function read_image_bgr in keras-retinanet/utils/image.py link to code) and returned as an array in height,width order. The code is :
from PIL import ImageExample my (640,480) image (width,height order) is returned as a numpy array of shape (480,640,3). This image array will later be fed to the model that expects a tensor whose dimensions are (batch,width,height,channels). I understand that this is simple convention and as long as the model processes the bounding boxes the same way, it is probably harmless. But I would like to be sure if it's desired behavior or not.
As I'm using resnet backbone and want to do transfer learning, a corollary question is: in what order the resnet model expects images, (w,h) or (h,w)? I suppose that if the learned weights that I'm planning to use have been derived from learning of images in (w,h) order, then I should use this order too right?
Thanks for your help!
Patrick