This really depends on your code. One thing to keep in mind is: if you start up your script every time you classify an image, than you take time to load the neural net into memory and than classify it.
One way is to get a batch of images and load the neural net once and than just loop through it.
A better way would require a bit more work:
set up a script that has an open socket and that waits for images as inputs.
once the script loads the neural net you wont have to worry about it again until you restart. Then just process the incoming images one at a time. This should be faster.
Using this way you can start several servers on different ports. The only bottle necks would be the amount of RAM and CPU that is being used.
I suspect that the CPU would start to bottleneck before you fill up the RAM.
Also yes use OpenBLAS, you will see a benefit for it.
Last: if you still need more speed, you might want to invest in a GPU
I run a GTX 980 and it classifies 256X256 images after the NEURAL NET has been loaded in 25ms/ per image.
Regards,
Andriy