The reference FCN code at
fcn.berkeleyvision.org reshapes each input so that every batch has different dimensions.
While the dimensions within a computational batch have to have the same dimensions, since it's all in a single array, the learning batch size can be made of different sized inputs by accumulating gradients. To do so, set the `iter_size` field of the solver to >1. With the regular FCN batch size of 1, `iter_size` is effectively the batch size.
That said, I advise batch size == 1 and high momentum over accumulating gradients, or at least trying that first. In my experiments on semantic segmentation I've found online learning over images in this way to train faster.