Hello,
I know about Martin Thompson excellent work since a good time now, and recently I wanted to better understand some queuing theory he discussed in the Arrested Devops podcast.
He gave the example of a service having an average response time of 100ms, and this service receives on average 9 requests/second.
If the service response time is divided by 2 (50 ms), Martin said that the service becomes 20x times more "reactive".
I was not sure what he meant exactly by reactive, but I tried to understand where these 20x came from.
Here is what I came to, and I wanted that some experienced people confirm to me if my reasoning is correct or not.
I assume that Martin is considering the M/M/1 queue model. Then the theory says that on average, the waiting time (not the sojourn time) is: ρ/(μ − λ).
With the real numbers, we have:
- case 1 (service time = 100ms): waiting time = (9/10) / (10 - 9) = 0.9s = 900ms
- case 2 (service time = 50ms): waiting time = (9/20) / (20 - 9) = 0.04s = 40ms
And then 900 / 40 ≈ 22. So I think that when Martin is saying that the system is more reactive, he is talking about the waiting time.
So my next question is: should I consider this result true for our CPUs and memory as well?
I am particularly interested in this question, because I read in the Google SRE Book that for optimal resource utilization (and thereafter for optimal costs), they try to make their CPUs almost always full of tasks.
Thank you in advance for your responses and thoughts on this subject.
Michaël