CPU utilization is a very important metric that is often misused. It tells us how busy we are, but more importantly, how un-busy we are. For CPU utiluzation to be meaningful, it needs to indicate the amount of idle time the system is experiencing. CPU utilization % is the most important and suscinct metric a sysadmin or a capacity planner can look at, for example, as when properly measured, it shows how much unused capacity a system has, and how far from "full" it is. most sane sysadmins will keep peak sustained utilization levels somewhere between 20-60% of the levels that they know are "demonstarably safe" for the application at hand, knowing that headroom is th #1 tool for stability under load. They also know that saturation is the mother if all evils when it comes to system availability and acceptible service levels, and use metrics like CPU utilization to stay as far away from saturation as their budgets will allow.
To the sane admin and the experienced capacity tester, the world is simple and very empirical: if a system was demonstrably able to happily support a workload of X at some far-from-saturated CPU % Y, then a sysadmin who is interested in job security can sleep well at night as long as the CPU load never ever goes above some level (say Y/3), and will call his boss asking for immediate budget when that line gets crossed. CPU % doesn't cover all the resources that need to be watched for how-far-from-empirical-saturation-am-I conditions (like network bit rates, and disk i/o operation rates), but it does correctly summerize many others (CPU cycle use, coherency and memory bandwidth use, interrupts, etc.) into a single easy to watch how-far-from-trouble-am-I point of view.
Unfortunately, most benchmarking is done at CPU utilization levels that are nowhere near the range sane administrators will ever allow their systems to operate, and fortunately sane sysadmins have long ago learned to completely disregard most synthetic benchmarks when trying to draw "how much can my stuff really handle" lines for production. Beyond the other follies often found in synthetic benchmaking, it is the practice if at-saturation testing, which is what most such benchmarks spend the majority of their time measuring, that lead to many mis-measurements. This aruses quite simply from the fact that at saturation things behave differently, in multiple directions, than they would under the normal operating parameters that administrators work so hard to keep their systems in. I.e. some things get much much better at saturation, some are much much worse, and only some things actually stay the same. Here are some key examples:
Some things that get unrealistically better:
- CPU cache efficiency and miss rates usually get better at or near saturation.
- I/O throughout carrying capacity usually gets much better at or near saturation.
- I/O efficiency and the amount of CPU spent on handling I/O droos dramatically at or near saturation.
- GC efficiency can often increase at or near saturation.
Some things that get unrealistically worse:
- CPU scheduling delays and resulting externally measured latency behavior often get dramatically worse at or near saturation. Often introducing entirely new behavior modes to latency measurements.
- I/O subsystem latency behavior (not those silly averages, but the real metrics that matter like 99.9%'iles) usually get dramatically worse at or near saturation as queuing a effects kick in.
- Power management on modern CPUs can often slow things dramatically at or near saturation levels.