Inquiry About How Data was Selected for Autopilot Memory Slack CDF's and Cluster Variation

32 views
Skip to first unread message

William Meng

unread,
Oct 19, 2022, 4:44:56 PM10/19/22
to Google cluster data - discussions
Dear All,

I had a couple quick questions regarding how the 5000 Autopilot/No Autopilot jobs were selected to generate the Memory Slack CDF's (Fig 3,4) in the   "Autopilot: workload autoscaling at Google" paper.

1. How were these 5000 jobs selected exactly? Were 5000 random jobs selected from all jobs that had Autopilot enabled (with the respective Algorithm) and then another 5000 selected from jobs without Autopilot? If this is the case how is it determined whether or not a job has Autopilot enabled? Is this done randomly or is there some logic to decided whether or not a job has Autopilot enabled on it, ergo certain types of workloads have Autopilot enabled while others do not.

2. Regarding the 8 different clusters of data that the dataset has, how are jobs assigned to these clusters? Is this done randomly or is there some system that makes it so a specific cluster is typically used to run a certain type of job?

Thanks for the help and clarification on these questions. I appreciate the information!

Sincerely,

Willie

Nan Deng

unread,
Jan 31, 2023, 6:59:54 PM1/31/23
to Google cluster data - discussions
On Wednesday, October 19, 2022 at 1:44:56 PM UTC-7 wlme...@gmail.com wrote:
Dear All,

I had a couple quick questions regarding how the 5000 Autopilot/No Autopilot jobs were selected to generate the Memory Slack CDF's (Fig 3,4) in the   "Autopilot: workload autoscaling at Google" paper.

1. How were these 5000 jobs selected exactly? Were 5000 random jobs selected from all jobs that had Autopilot enabled (with the respective Algorithm) and then another 5000 selected from jobs without Autopilot? If this is the case how is it determined whether or not a job has Autopilot enabled? Is this done randomly or is there some logic to decided whether or not a job has Autopilot enabled on it, ergo certain types of workloads have Autopilot enabled while others do not.

I'm not working on the autopilot paper. You may want to contact the authors directly on those questions. Note that they may use data that is outside the public cluster trace. 

2. Regarding the 8 different clusters of data that the dataset has, how are jobs assigned to these clusters? Is this done randomly or is there some system that makes it so a specific cluster is typically used to run a certain type of job?

Human. Human decides where to run their jobs depending on different requirements of their services. We also have some automatic tools to help users to pick or even automatically pick one given some requirements. But in the end, a human needs to decide at least some general requirements on cell selection.
Reply all
Reply to author
Forward
0 new messages