It looks like your data is 1-channel (grayscale) but the mean file is (3-channel) RGB. At least that's what the following Transformer error indicates:
It would be useful to see a more complete log - e.g. what size is the data actually, what is the mean shape.
To answer your first question, mean file should be the average taken of all images in your training dataset. The alternative to it is using per-pixel mean, in which case you only need to supply average values for each pixel component (R, G, B) which will then be subtracted from every pixel in every image you put into the network.