A disclaimer: I did not participate in the work of releasing the 2011 trace (I haven't joined Google back then). I did, however, work on preparing and publishing the 2019 trace. I do not know too much of details about the 2011 trace. So some of the descriptions below are my speculation.
It is indeed abnormal to see cpu rate or maximum CPU rate columns having values >1.0. To deal with such abnormal data, the first question I would ask is: How often does it happen? Through a quick query on BigQuery (Yes, we also imported the 2011 trace into BigQuery, which is a huge productivity boost), I found that in the task_usage table, there are 583 rows out of 1232799308 with cpu_rate > 0. It should be considered as really rare (<10^-7) and it's likely to be some noise introduced through data collection.
Just looking at the samples you provided, I noticed that duration of each sample (end_time - start_time) is normally less than 5 min. This means there is actually very little data collected by the node agent in Borg, which we call it Borglet. If I add this condition to my query to see how many rows have cpu_rate > 1.0 AND (end_time - start_time) >= 300000000, there is actually only 4 rows satisfies this condition.
If we change the filter to see how many rows have maximum_cpu_rate > 1.0, the numbers are larger: 800845 out of 1232799308 samples. If we also add the condition to only inspect samples with duration >= 5min, the number is down to 765805. It is still <0.06%, which should be considered as rare.
Given the frequency of such thing would happen, it could be caused by lots of things, like normal noise introduced through the data collection procedure, the node agent (which we call it Borglet) being unresponsive, kernel bug, bad hardware, etc.
Regarding to the noise introduced by data collection, here is a possible scenario that could make this thing happen:
Remember the counter maintained by the kernel we used to collect CPU usage data, to quote from my previous email:
> We use cgroup to isolate and monitor tasks. Each task consists of a set of processes. For each cpu cgroup, the Linux kernel maintains a counter that keeps increasing. Whenever a process running in a cgroup (i.e. a task) has used CPUs, the OS would increase the counter of the cgroup by the amount of time that the task has been running on the CPUs. So underlying Borg, the CPU usage of a task is maintained by the Linux kernel and it is a monotonically increasing integer. Every second, Borg checks the counter for every task and take the difference between the two consecutive measurements. The difference gives us the amount of time that a task has been running on CPUs between the two measurements (in Borg, it's around 1 second).
Just imagine how would you implement such data collection procedure. In an ideal world where you want to collect truly accurate data, what you need to do is to pause the whole system, meaning no one except the data collection program can use the CPU. Then you read the counters of each task and take the difference of the two consecutive reads of each counter, divide it by the duration between two consecutive pauses and you get the CPU rate. However, in a real system, you can never pause a process every second simply for the sake of a slightly higher quality of data. What we are actually doing is to read the counters sequentially (with some level of parallelism of course) while they are running and reading the current time. This means, that you would never know when you actually read the counter because at the moment you read time, the value of the counter could be changed. This means when you take the rate by dividing the duration, it is never accurate as the duration is the the duration between two events of reading the kernel's CPU usage counter; rather, it's just the duration between two events of reading time. A psudocode of this logic could be as follow:
last_read_time = now()
# There are N tasks running on the machine
last_cpu_usage_counter_value = [0] * N
cpu_rate = [0] * N
while True:
# Sleep for one second
sleep(1)
# Timestamp in seconds
current_time = now()
# Imagine what would happen if the program is stalled here.
# Other tasks are still running, hence their cpu usage
# counter would keep increasing.
duration = current_time - last_read_time
last_read_time = current_time
for each task i:
current_cpu_usage_counter = read_task_cpu_usage(i)
cpu_rate[i] = (current_cpu_usage_counter - last_cpu_usage_counter_value[i]) / duration
last_cpu_usage_counter_value[i] = current_cpu_usage_counter
For simplicity, I just assume there are always N tasks running on the machine and we can keep a fixed-length array to store properties for each task. Because tasks are still running after calculating duration in each iteration of the while loop, it is possible that cpu_rate can be larger than it should be. One may argue that we can solve this by either smartly put the statement of calling the now() function; and/or individually maintain last_read_time for each task. But as long as there's time elapsing between calling now() function and reading the task's cpu usage counter, there's always possibility that the calculated duration could be smaller than the actual duration between two read events to the counter. In such cases, the calculated CPU rate is enlarged. In most real scenarios, such noise could be negligible. But in rare cases, for example, the data collection agent was stalled after taking the current time but before reading the counters, then the noise could very large. In some cases, it may make the cpu rate higher than the machine's underlying capacity.
The reason that maximum cpu rate has more abnormal rows than cpu_rate, it's simply because cpu_rate is average across 5 min while maximum cpu rate is the maximum of the per-second cpu rate collected within 5 min.
I hope the explanation is helpful. In short: It's rare enough that can be easily considered as noise.