I believe I first saw it at a CMG conference: in the mid-90s; if memory serves. My impression is that it was a piece of IBM folklore that had been developed empirically to estimate LPAR capacity for MVS workloads. I've never seen a mathematical derivation, anywhere, other than my own.
My derivation comes from observing that the M/M/m server appears to "morph" from a parallel queueing facility at low traffic to a single high-speed M/M/1 server under heavy traffic. Strange but true. As I've said elsewhere: there is no parallelism, just fast serialization. :) The question then becomes, what's the morphing function? The simplest candidate is a finite geometric series, and that's what you see in my book. Easy to understand and pretty close to exact. Good enough for Guerrillas.
I call it "heuristic" because it diverges from the exact formula by about 10% in the worst case. The reason for that lies, in part, in the way the M/M/m waiting line forms, i.e., buffering. Parallel queues (as the name suggests) each have their own buffer that can possibly become occupied, even briefly, at low loads, due to arrival fluctuations. But there aren't really m-buffers in M/M/m; there's only one. Expressing that uniqueness analytically, however, rapidly spawns a jungle of unintuitive mathematical terms (Gamma sums, etc.). Nonetheless, I can point you at terms in the jungle that do indeed correspond to a finite geometric series (morphing) but no longer in a way that makes intuitive sense or relates to the morphing model in an obvious fashion (e..g, simple correction terms).
Although it's just an idle (can I use that word?) curiosity, I've always felt it ought to be possible to derive the exact M/M/m formula intuitively, along the lines of the morphing approach without relying on probability theory or Markov chains, or any other sophisticated clutter. Then I could actually say I understood M/M/m. Well, I have discovered many different ways of deriving it (could easily be a paper on its own), but I'm not happy with any of them from an explanatory standpoint: my sole motivation being to provide more insight, not more fancy math. :/