Jobs failing with ERROR with partial log

69 просмотров
Перейти к первому непрочитанному сообщению

Christian Chan

не прочитано,
22 мар. 2023 г., 12:16:2722.03.2023
– AWX Project
Hello,

We are experiencing issues where jobs randomly fail with status code ERROR and partial logs. When a job fails if you restart it enough times it will eventually complete successfully. It also does not seem to be time specific as the same job can fail in less than 5 minutes with couple of lines in the log or more than an hour with hundreds of lines in the log.
AWX version is 21.8.0 running on AKS 1.22.15.

If anyone has encountered similar behavior and has any ideas I would be thankful.

AWX Project

не прочитано,
22 мар. 2023 г., 14:51:4822.03.2023
– AWX Project
can you provide the output of /api/v2/jobs/<job_id> for one of the failed jobs (remove sensitive info)

We can take a look and see if anything stands out

AWX Team

Christian Chan

не прочитано,
23 мар. 2023 г., 10:51:2323.03.2023
– AWX Project
Hi,

Attached is an output of a failed job as requested.
AWX.txt

Tian Muc

не прочитано,
31 мар. 2023 г., 05:28:4131.03.2023
– AWX Project
I´m not sure if it´s the same for you,  but there was an issue with long running jobs being terminated unexpectedly: https://github.com/ansible/awx/issues/11594

AWX Project

не прочитано,
31 мар. 2023 г., 14:01:3231.03.2023
– AWX Project
As Tian mentioned, you could be running into the timeout issues. These are resolved in latest. Do you mind upgrading to latest AWX and retrying?

AWX Team

Christian Chan

не прочитано,
3 апр. 2023 г., 07:41:5903.04.2023
– AWX Project

Unfortunately I don't think it is related to the linked timeout issue as jobs can fail within minutes.
As for upgrading we have to discuss this with our client as the final decision is theirs.
Ответить всем
Отправить сообщение автору
Переслать
0 новых сообщений