Thank you very much for your answer. As the analysis progresses, I have some other questions:
1. I found that the resource usage of allocset is different from the resource usage of the instances it contains. May I ask if this difference is caused by maintaining the resource allocation of allocset, monitoring error, or other factors?
2. I found that the number of submit, schedule, and terminate events (EVICT, FAIL, FINISH, KILL, LOST) for many retried instances is not the same, or there are cases where scheduling is less than submission and termination. I also observed the ...->SCHEDULE->UPDATE_RUNNING->SUBMIT situation. This makes me confused about the retry mechanism of the instance. It is difficult to obtain the scheduling time and execution time of the instance.
In summary, the two questions are:
1) What are the possible reasons for the difference in resource usage between allocset and its contained instances?
2) What could be the causes of the inconsistent numbers of submit, schedule and terminate events for retried instances? And how to properly interpret such event sequences to understand instance retries?
3) Is it feasible to calculate the scheduling time and execution time of instances?