Error opening pod stream' or 'Stream error running pod' randomly on some jobs

587 views
Skip to first unread message

MrMel94

unread,
Sep 16, 2022, 8:15:35 AM9/16/22
to AWX Project

Hi I have k3s cluster with awx installed on it everything works fine except one thing (and this is very important thing).

Sometimes in a random way 1 job will not start and in the output I have this error :

Error opening pod stream: Get "https://ip_of_my_server:10250/containerLogs/awx/automation-job-567099-7hw6l/worker?follow=true": EOF

or more rarely

Stream error running pod: stdin: error dialing backend: EOF, stdout: http2: response body closed

But if I restart the job it will start and run normally.

I noticed that it happens when several jobs are running at the same time (10 or more), but it not always happens ; for example I have a big workflow with 32 jobs starting in the same time, twice the morning at 7:00 am and 8:00 am, and sometimes both will succeeded, or sometimes in the first workflow 2 or 3 jobs will failed but, the second launch will be ok. It's very weird.

I looked for a way to limit the number of jobs running in parallel on AWX but I couldn't find it, and my number of forks in the AWX configuration is 100. My server has 12 CPUs and 20gb of ram, so I don't think it's a resource issue.

My version of K3S is : v1.24.4+k3s1
My version of AWX is : 21.5.0
My system is : CentOS Linux release 7.9.2009

Maybe someone have a solution for this or a workaround ? Because it's very problematic sometimes when I have some workflows that runs, the workflow failed because of 1 job with this error.

Thanks

m.ne...@cityscoot.eu

unread,
Sep 16, 2022, 9:09:56 AM9/16/22
to awx-p...@googlegroups.com
Hi,
I had similar behavior, so I downgraded AWX version but was still having issues just not as frequent.
For now I am running:
K3s: v1.21.9+k3s1
operator: 0.26.0
AWX: 21.4.0

Never experience the issue again, with several scheduled jobs running, some every 5 minutes.

My setup can be found here: https://github.com/antuelle78/deploy-awx-k3s-ubuntu

Regards,
Antuelle78
> --
> You received this message because you are subscribed to the Google
> Groups "AWX Project" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to awx-project...@googlegroups.com
> <mailto:awx-project...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/awx-project/d2f91c92-7316-42b5-8bf3-5b22ee95151fn%40googlegroups.com <https://groups.google.com/d/msgid/awx-project/d2f91c92-7316-42b5-8bf3-5b22ee95151fn%40googlegroups.com?utm_medium=email&utm_source=footer>.
>

AWX Project

unread,
Nov 2, 2022, 2:55:45 PM11/2/22
to AWX Project
If there is any kind of hiccup while transmitting the inputs to ansible-runner in the pod we can not resume the connection because the ansible-runner process stdin will be pouted. Because of this we have to fail the job.

-The AWX Team
Reply all
Reply to author
Forward
0 new messages