re-run failed task only

722 views
Skip to first unread message

Kien Pham

unread,
Sep 5, 2014, 1:18:04 PM9/5/14
to ansibl...@googlegroups.com
Hi all,

I would like to discuss whether ansible should support the re-run failed task. This is different with the suggested answer here:

It is similar to how we would run unit tests and only re-run the failed tests during development.

Here's the feature request summary got closed but I still want further discussion around this topic
https://github.com/ansible/ansible/issues/8896

Thanks,
Kien

Jesse Keating

unread,
Sep 5, 2014, 1:43:56 PM9/5/14
to ansibl...@googlegroups.com
I’ve thought about this a bunch, but it’s really hard. Many of our tasks require data from previous tasks, and often those previous tasks will “succeed” in one run only to have a later dependent task fail. Figuring out which ones to re-run there is nearly impossible.

Since Ansible is designed to support idempotence, we make sure that we can re-run any of our playbooks at will. Tasks which have already completed will finish fast as ‘unchanged’ and only the tasks that haven’t ran yet will cause new change. Trying to bake something into ansible to only re-run failed tasks is probably going to cause too many gotchas to be really useful.

For your own setup, you could make use of —start-at-task if you really know where you can skip ahead to.

-jlk

Kien Pham

unread,
Sep 5, 2014, 1:57:02 PM9/5/14
to ansibl...@googlegroups.com
Hi Jesse,

Thanks for your response. I was looking at the implementation of --start-at-task and looks like it just read the task name:

Then would it be possible that we write failed tasks name to file and with a flag such as --failed-tasks-only, we would parse that file at the same location where we are handling start_at now?

Thanks,
Kien

Michael DeHaan

unread,
Sep 5, 2014, 3:03:58 PM9/5/14
to Kien Pham, ansibl...@googlegroups.com
Rather than discussing the previous ticket (our reasons hold here), let's discuss the use case a bit first so we can get a greater understanding.

What is the task you are running and why do you need to rerun it?

That may lead to some modelling suggestions.




--
You received this message because you are subscribed to the Google Groups "Ansible Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-deve...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kien Pham

unread,
Sep 5, 2014, 3:33:00 PM9/5/14
to Michael DeHaan, ansibl...@googlegroups.com
Michael,

It's more of for development. It's how we do unit testing where you write tests, expect some to fail but you only want to re-run failed tests only until it's all green.

The same thing apply to ansible tasks. I might have 100 tasks and only 1 or 2 tasks failed. Now I have to re-run 100 tasks again just to check if I have fixed the 2 tasks that failed. It would be awesome if I just have to run 1,2 tasks that failed to quickly verify it during development. We actually spend a lot of time developing these playbook tasks to get it right. 

--
Kien Pham
Software Engineer, R&D SendGrid


Michael DeHaan

unread,
Sep 5, 2014, 3:39:10 PM9/5/14
to Kien Pham, ansibl...@googlegroups.com
Well the problem is if you just re-run the failed parts, you won't be validating that the previous steps can run again cleanly on top a second time, right?  In which case, running them again makes sense, as it will just go over the server policy and check to make sure everything is up to date.

I understand what you are saying about targetting specific parts of the config, and I do like tagged roles for that kind of thing pretty well.

Some people like --start-at-task, which sounds like it will do what you want though, start at that particular point.   I don't use it though.


Kien Pham

unread,
Sep 5, 2014, 3:45:00 PM9/5/14
to Michael DeHaan, ansibl...@googlegroups.com
Hi Michael,

I think I'm ok with --start-at-task for now. Basically if my task #50 failed out of total 100 tasks, I would at least cut half of the runtime already.

My point to Jesse is it doesn't seem very complicated if we already have --start-at-task implemented to support --start-failed-task-only.

Thanks,

--
Kien Pham
Software Engineer, R&D SendGrid



Michael DeHaan

unread,
Sep 5, 2014, 5:08:03 PM9/5/14
to Kien Pham, ansibl...@googlegroups.com
Right now the retry file doesn't record this and just returns a "--limit @filename.yml" type file.  If it did, it might be more straightforward to make this an option, but we'd need something like a --retry-file or something.




Reply all
Reply to author
Forward
0 new messages