[...Namely 20 Mnps on 1 thread would be much more effective than 20Mnps on let say 16 threads]
Yes, correct! Multi-threading adds some overhead. It depends of course on how you define the word "effective" here.
In terms of chess strength in both cases it's the same [same Mnps = same number of positions searched, although I'm not so certain about this:) ], but in the later one you reserve more resources.
The truth is, that performance does not increase linearly with the number of cores.
What actually saves us, is that if we achieve 20 Mnps on 1 core, we could expect much more on 16 cores. (Note I use the word core and not thread)
If this worth all the extra programming effort, is another matter.