Thank you.
I have a few follow up questions, appreciated in advance for answers:
1) In AWX version 21.2.0 (now updated to 21.3 last week), task, ee & web containers are all within one pod itself - which is no issue but the earlier question remains as is - what is the role of task and ee resource requirement? Where are these resources used?
What gets impacted if I reduce or increase resources given to these two containers?
- Above is still a mystery because as mentioned in your response as well, the job run takes resources values that are either default or customized under container group. If that is true, then where the 'task' and 'ee' resources are being used?
2) The consistent error I'm getting in one of my jobs - 'Task was marked as running but was not present in the job queue, so it has been marked as failed' - is happening when I'm connecting to 27 VMs to run a few commands on 100 Network devices from each VM.
The output is getting registered in a 'dictionary' (per inventory host).
- This issue is consistently reproducible with the number of devices mentioned. Now, I'm suspecting, somewhere the job is going out of memory or CPU, as the number of dictionaries grows. Same job runs fine every time with lesser number of devices (haven't rigorously tested limits).
- So, that is where I'm trying to understand what resources and limits might be contributing to my issue.
- I haven't created an issue on github since I saw one already existed with no much updates/progress.
3) What's the role of the 'control plane' instance group? It shows certain numbers in max fork etc. Are they relevant in K8s?
Regards,
Deepanshu