-np sets the number of processes. To set the number of threads per process you need to set OMP_NUM_THREADS. See this wiki for an example batch file:
You may wish to review Chatper 3 in the User's Guide. General experience is that OMP provides a limited speed up. Not all routines in FDS are split over OMP threads. On the NIST cluster (Fig. 3.2) speedup tops out around a factor of 2 at four threads. You may be better off dividing your domain into more meshes. 96 meshes will give similar speedup to 48 meshes with 10 cores. You can run the benchmark case used for the plot to determine the performance of your cluster. The OpenMP cases are in Examples\Timing_Benchmarks where you installed FDS. You would want to run these with 1 MPI process and 1,2,3,4, etc, threads one at a time on a node with no other jobs. There are two base cases with 64^3 or 128^3 grid cells. The a,b,c, etc. variants are indentical and just so you don't overwrite results as you test different thread numbers.