Dear all,
As we have some jobs that are restarted (due to preemption, etc.), we chose to use the Cumulative values of RemoteUserCpu and RemoteSysCpu for CPU time accounting. If we understand the documentation correctly, RemoteUserCpu and CumulativeRemoteUserCpu should only be different if the job has been evicted; in other words, if NumJobStarts > 1. Am I right?
However, we are seeing that for some jobs, CumulativeRemoteUserCpu and RemoteUserCpu are not equal, even though NumJobStarts = 1.
Furthermore, if we look at the Wallclock attributes for these jobs, RemoteWallclockTime is higher than LastRemoteWallclockTime, yet NumJobStarts remains 1.
We exclude the Parallel Universe jobs.
# condor_history -const 'RemoteUserCpu =!= CumulativeRemoteUserCpu && JobUniverse =!= 11' -limit 10 -af NumJobStarts NumShadowStarts RemoteUserCpu CumulativeRemoteUserCpu RemoteWallclockTime LastRemoteWallclockTime
1 2 8113250.0 12396682.0 1036767.0 488485.0
1 2 8221022.0 12568705.0 1036818.0 484716.0
1 2 7993956.0 12169316.0 1036764.0 483220.0
1 2 81616095.0 94608714.0 345605.0 289182.0
1 2 65189347.0 77678751.0 276634.0 223331.0
1 4 8034448.0 14583796.0 1036831.0 191250.0
1 2 8970036.0 16283310.0 1036801.0 191247.0
1 3 23161184.0 42148010.0 1036833.0 189642.0
1 2 72414226.0 96255674.0 309464.0 204569.0
1 2 53818257.0 66894517.0 228412.0 172183.0
Do you know what could be the reason for this?
We are running HTCondor 25.0.11 on our Access Points (AP) and Central Manager (CM), and 25.0.9 on the majority of our Execution Points (EP).
Best regards,
Carles
-- Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona