A. To the extent that it makes sense for your particular geometry, having similar numbers of cells per processor (you can assign more than one mesh to a processor) will limit the time during which processors will sit idle waiting for input from other processors. If you have four processors and your geometry is such that it makes most sense to have meshes of 40,000 cells, 20,000 cells, and two at 10,000 cells it may make more sense to just use two rather than four processors (leaving you two processors to run another simulation on).
B. If it makes sense to do so, dividing your overall domain to keep meshes looking more like cubes will generally reduce the number of cells which need to share information. For example: Your domain is 4 m x 4 m x 4 m with 0.1 m grid resolution (40 x 40 x 40). I could split this into 4 equal meshes vertically (4 meshes of 40 x 40 x 10) or into 4 equal meshes vertically and horizontally (4 meshes of 20 x 40 x 20). In the first case I have 3 shared boundaries each 40x40 for 4800 wall cells I need to pass information on. In the second case I have 4 shared boundaries each 20x40 for a total of 3200 wall cells.
C. There is a point of diminishing returns with adding more processors. FDS does an independent solution on each mesh that are then stitched together. Too few cells per mesh and you may start to impact the quality of the solution. Also, adding more meshes just to use more processors may reach a point where the cost of the additional communication outweighs any cost savings of having an additional processor.
D. The total number of mesh cells in the simulation should be driven by the grid resolution required to get good results and not by some limit of grid cells per processor. If your problem needs 4 million cells and you have 4 cores, you shouldn't be reducing your grid resolution to give you 2 million cells.
E. Be careful with placing your mesh boundaries. Poor selection of where to place mesh boundaries can result in FDS having to iterate the pressure solution to achieve convergence of the mass and energy flows across the mesh boundaries. The extra time this takes can quickly negate any benefits of having divided one mesh into two. Try to avoid things like placing a mesh boundary so you create very small isolated regions in the adjacent mesh or placing a mesh boundary right at the edge of vent with a strong flow. By isolated region I mean like below where S is solid, + is gas, and | is a mesh boundary.
This mesh arrangment
S S S | + + +
S + + | + + +
S S S | + + +
+ + + | + + +
+ + + | + + +
+ + + | + + +
+ + + | + + +
Might be better as:
S S S + + +
S + + + + +
S S S + + +
- - - - - -
+ + + + + +
+ + + + + +
+ + + + + +
+ + + + + +