AWX Jobs Constantly Failing

505 views
Skip to first unread message

Polonius

unread,
Jul 7, 2022, 7:22:30 AM7/7/22
to AWX Project
I have migrated AWX to a new K3s cluster, and find that I now get the following errors for almost every job run:

Error opening pod stream: Get "https://awx.k3s.net:10250/containerLogs/awx/automation-job-12053-blqxd/worker?follow=true": EOF

Sometimes there are no errors at all, the job simply has a status of failed with:

"No output for this job"

What could be causing this?

Polonius

unread,
Jul 7, 2022, 7:43:43 AM7/7/22
to AWX Project
Also, I can see the automation pods being created, they are then terminated with a reason 'Killing'.

I cannot see any OOM or CPU errors, there is plenty of overhead resource wise.

Wei-Yen Tan

unread,
Jul 7, 2022, 10:44:49 AM7/7/22
to awx-p...@googlegroups.com
I encountered this too although on rke2. What version of awx

From: awx-p...@googlegroups.com <awx-p...@googlegroups.com> on behalf of Polonius <matt...@mintegration.net>
Sent: Thursday, July 7, 2022 11:43:43 PM
To: AWX Project <awx-p...@googlegroups.com>
Subject: [awx-project] Re: AWX Jobs Constantly Failing
 
--
You received this message because you are subscribed to the Google Groups "AWX Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to awx-project...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/awx-project/7efe1ac9-5ac2-46a3-9490-fcff52b7c5c8n%40googlegroups.com.

Matt Page

unread,
Jul 8, 2022, 6:29:10 AM7/8/22
to awx-p...@googlegroups.com
Glad to know I'm not the only one. Versions:

AWX - 21.2.0
Operator - 0.23.0
K3s - v1.23.7+k3s1

On 7 Jul 2022, at 15:44, Wei-Yen Tan <weiye...@gmail.com> wrote:


You received this message because you are subscribed to a topic in the Google Groups "AWX Project" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/awx-project/VmzgmmrP8k8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to awx-project...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/awx-project/SYBP282MB35259A6BCDCEE7DB21B7202AAD839%40SYBP282MB3525.AUSP282.PROD.OUTLOOK.COM.

AWX Project

unread,
Jul 13, 2022, 3:56:11 PM7/13/22
to AWX Project
Hello!

a similar issue reported here https://github.com/ansible/awx/issues/12288

Sounds like an issue when too many jobs are running at the same time, and resource limits are being hit.

| I cannot see any OOM or CPU errors, there is plenty of overhead resource wise.

What metrics are you using from the k8s engine to see overhead?

AWX Team

Matt Page

unread,
Jul 14, 2022, 5:27:31 AM7/14/22
to awx-p...@googlegroups.com
Hello

Thanks for getting back to me.

This appears to only occur when I run a playbook which requires dynamic inventories to be updated. The inventories (3 sources) appear to be updated one at a time (each inventory source creates a pod for the update, the next source waits for the previous before starting it's own pod).

I am using k9s, which I believe gets it's metrics from the k8 metrics API. This shows, prior to playbook run, CPU 5%, memory 49%, there are no spikes when the errors occur.

I must also mention that no other AWX tasks are running at the time.

Regardless, I have increased the memory/CPU of the nodes and alas the same issue occurs.

Thanks

On 13 Jul 2022, at 20:56, AWX Project <awx-p...@googlegroups.com> wrote:

Hello!

Matt Page

unread,
Jul 18, 2022, 6:19:08 AM7/18/22
to awx-p...@googlegroups.com
Hi

Any idea where to take this?

Thanks

On 14 Jul 2022, at 10:27, Matt Page <matt...@mintegration.net> wrote:



AWX Project

unread,
Jul 20, 2022, 3:44:38 PM7/20/22
to AWX Project
| I have migrated AWX to a new K3s cluster

what k8s cluster (and version) were you using previously where AWX was working?

Anything special in your awx-demo.yml (or equivalent), feel free to copy and paste it here (remove sensitive info first please)

AWX Team

Matt Page

unread,
Jul 29, 2022, 5:47:02 AM7/29/22
to awx-p...@googlegroups.com
Unfortunately I can't remember the k3s version and that is now gone.

I can say the old version of AWX was 19.5.0 (operator 0.15.0).

You can find my configurations in the github issue:


Thanks

On 20 Jul 2022, at 20:44, AWX Project <awx-p...@googlegroups.com> wrote:

| I have migrated AWX to a new K3s cluster

AWX Project

unread,
Aug 5, 2022, 1:20:40 PM8/5/22
to AWX Project
Hi Poloniuns,

Since we don't have too much insight into your environment, are you able to try out Tower in another isolated k8s environment such as minikube?

If so, may we kindly ask that you do so, and let us know if the issue persists. Try running the demo OOTB project on a fresh instance of AWX. This might help us discern if the issue is within our product or an environmental issue.

Thanks,
AWX Team

Matt Page

unread,
Aug 9, 2022, 5:37:10 AM8/9/22
to awx-p...@googlegroups.com
Hi

Apologies for the delay.

To be frank, I don't see the point in doing this, I assume that you already know that OOTB AWX works on a minikube instance.

What I was hoping for in raising this issue is some pointers as to where I might be able to look to help identify where the issue is coming from.

Thanks

On 5 Aug 2022, at 18:20, AWX Project <awx-p...@googlegroups.com> wrote:

Hi Poloniuns,

Wei-Yen Tan

unread,
Aug 9, 2022, 6:32:47 AM8/9/22
to awx-p...@googlegroups.com
 I relooked at your issue.  I solved my problem by upgrading the kubernetes version. I suggest you upgrade the k3s version to the latest one first   

From: awx-p...@googlegroups.com <awx-p...@googlegroups.com> on behalf of Matt Page <matt...@mintegration.net>
Sent: Tuesday, August 9, 2022 9:37:03 PM
To: awx-p...@googlegroups.com <awx-p...@googlegroups.com>
Subject: Re: [awx-project] Re: AWX Jobs Constantly Failing
 

Matt Page

unread,
Aug 9, 2022, 8:01:43 AM8/9/22
to awx-p...@googlegroups.com
Thanks, but I'm not able to upgrade k3s as another app depends on the earlier version I'm afraid.

On 9 Aug 2022, at 11:32, Wei-Yen Tan <weiye...@gmail.com> wrote:



Wei-Yen Tan

unread,
Aug 9, 2022, 8:04:30 AM8/9/22
to awx-p...@googlegroups.com
Then unfortunately I can't help you with that. The newer version of awx requires a certain version of kubernetes that is only provided by upgrading to the latest version of k3s. 

The only option I have for you is to remain on awx 19.5 which is the one I think is good your  version of 
K3s until you are ready to move up 
Get Outlook for iOS

From: awx-p...@googlegroups.com <awx-p...@googlegroups.com> on behalf of Matt Page <matt...@mintegration.net>
Sent: Friday, July 8, 2022 10:29:05 PM
To: awx-p...@googlegroups.com <awx-p...@googlegroups.com>
Subject: Re: [awx-project] Re: AWX Jobs Constantly Failing
 

Павлов Александр

unread,
Aug 9, 2022, 8:24:45 AM8/9/22
to awx-p...@googlegroups.com
Which version of k8s is required and why?

Скачайте Outlook для iOS

От: awx-p...@googlegroups.com <awx-p...@googlegroups.com> от имени Wei-Yen Tan <weiye...@gmail.com>
Отправлено: Tuesday, August 9, 2022 3:04:22 PM
Кому: awx-p...@googlegroups.com <awx-p...@googlegroups.com>
Тема: Re: [awx-project] Re: AWX Jobs Constantly Failing
 

Wei-Yen Tan

unread,
Aug 9, 2022, 8:29:22 AM8/9/22
to awx-p...@googlegroups.com
It's the latest one. I am just sharing my experiences. I had a an older version of kubernetes. Then as I upgraded awx to the later versions of awx I increasingly found those errors that the op mentioned. I then read a change log mentioning this and to upgrade to a later version. 

The moment I upgraded the problems I had immediately went away 

From: awx-p...@googlegroups.com <awx-p...@googlegroups.com> on behalf of Павлов Александр <alexander...@gmail.com>
Sent: Wednesday, August 10, 2022 12:24:38 AM

Matt Page

unread,
Aug 9, 2022, 8:51:44 AM8/9/22
to awx-p...@googlegroups.com
Thanks, that is good information. Unfortunately now I need to find a way to downgrade AWX... I imagine this is not easy.

Does AWX keep a kubernetes version requirement? I can't seem to find one.

On 9 Aug 2022, at 13:29, Wei-Yen Tan <weiye...@gmail.com> wrote:



Wei-Yen Tan

unread,
Aug 9, 2022, 8:58:31 AM8/9/22
to awx-p...@googlegroups.com
I kept my awx configuration as code using redhat-cop controller configuration. Then i just wiped my awx and deployed a lesser Version. Then ran the controller configuration against my awx installation. Boom. All my awx configuration came back. 

Sent: Wednesday, August 10, 2022 12:51:21 AM

Wei-Yen Tan

unread,
Aug 9, 2022, 8:59:01 AM8/9/22
to awx-p...@googlegroups.com
That's my suggestion to you 

From: Wei-Yen Tan <weiye...@gmail.com>
Sent: Wednesday, August 10, 2022 12:58:24 AM

Matt Page

unread,
Aug 9, 2022, 10:06:06 AM8/9/22
to awx-p...@googlegroups.com
Thanks for the tip, I see passwords aren't exported with the tool you mentioned (too many to repopulate) so have opted to try upgrading the cluster.

AWX team - is there any documentation around supported kubernetes versions? I can't find any myself.

Thanks

On 9 Aug 2022, at 13:59, Wei-Yen Tan <weiye...@gmail.com> wrote:



Wei-Yen Tan

unread,
Aug 9, 2022, 10:14:58 AM8/9/22
to awx-p...@googlegroups.com
Yes they are i am talking about the collection not awx-cli export 

Sent: Wednesday, August 10, 2022 2:05:49 AM

Wei-Yen Tan

unread,
Aug 9, 2022, 11:07:03 AM8/9/22
to awx-p...@googlegroups.com
Come to think of it. Have you tried to do a fresh install of awx? Which includes destroying the postgres sql persistent volume? Then doing q reinstall?

From: Wei-Yen Tan <weiye...@gmail.com>
Sent: Wednesday, August 10, 2022 2:14:50 AM

Matt Page

unread,
Aug 10, 2022, 4:59:28 AM8/10/22
to awx-p...@googlegroups.com
I got the same error on the latest k3s version, so now I will try a fresh installation using the redhat-cop method you suggested.

I cannot see how to export the configuration using redhat-cop controller, any pointers?

On 9 Aug 2022, at 16:07, Wei-Yen Tan <weiye...@gmail.com> wrote:



Wei-Yen Tan

unread,
Aug 10, 2022, 5:03:53 AM8/10/22
to awx-p...@googlegroups.com
I suggest you look over this repository. 



I put my details in code as yaml (basically vars). And then the playbook example imports all this into awx 

Sent: Wednesday, August 10, 2022 8:59:22 PM

Matt Page

unread,
Aug 10, 2022, 5:32:16 AM8/10/22
to awx-p...@googlegroups.com
Thanks, yes I understand now. Although I still have to manually enter all the credentials. A fresh install isn't very practical given the volume of credentials we have.

Given I'm now on the latest version of k3s, I assume this must be database related, old config hanging round from the migration or something.

AWX team - I'd really appreciate it if you could point me as to where I should look to get the current instance stable.

Thanks

On 10 Aug 2022, at 10:03, Wei-Yen Tan <weiye...@gmail.com> wrote:



Wei-Yen Tan

unread,
Aug 10, 2022, 11:43:05 AM8/10/22
to awx-p...@googlegroups.com
you are welcome. just out of curiosity how many creds do you have? and types?


Reply all
Reply to author
Forward
0 new messages