variable precedence seems hopelessly broken

Dan Stillman

unread,

Aug 19, 2015, 10:02:19 PM8/19/15

to ansible...@googlegroups.com

I really like Ansible and have built a large infrastructure around it,
but I'm finding it untrustworthy to the point of being unusable.

In the last 9 months, I've reported 4 variable precedence bugs:

https://github.com/ansible/ansible/issues?utf8=%E2%9C%93&q=is%3Aissue+author%3Adstillman+

The first two were marked as P1 and fixed, and the third was confirmed
as P2 in December but remains open. The last one, which I reported
today, occurs in 1.9.3 but is fixed on devel for 2.0 — and yet devel
appears to break one of the P1 bugs (#9498) again, despite my including
a test case with the original report (as I've done for all of them). The
other P1 bug also disappeared and reappeared a couple times during 1.8
development as other variable bugs were fixed, which seems to be the
general pattern for these bugs.

If it's not clear, these are incredibly dangerous bugs in production
environments, because they can cause services to silently be rolled out
in the wrong location or with the wrong configuration. (I noticed this
because a service had been deployed to a directory with the name of
another service, resulting in two copies of the service trying to run —
though fortunately this was on a dev machine.) The safest solution I've
found is to configure different roles on the systems separately using
tags, but that somewhat defeats the purpose of a central configuration
management tool (and actually doesn't even avoid the P1 bug that's
broken again on devel, so I guess I should say the safest solution is
not to use variables at all).

It's possible I'm using variables somewhat differently than most people
using Ansible — the bugs I've reported all depend on include_vars within
a role, which I use extensively — but there seem to be quite a few
reports of variable bugs, and none of the issues I've reported have been
marked as invalid.

I don't want to abandon Ansible, but I can't keep using it if I can't
trust it to deploy services correctly. I also shouldn't have to keep my
own set of tests that I run whenever I try a new version just to make
sure dangerous bugs that I've reported previously — with those same
tests — haven't regressed.

If the current variable precedence system is salvageable (and I'm not
convinced it is or should be), it seems like many more integration test
cases are needed, all run in separate processes and — needless to say —
with new ones added whenever variable bugs are found.

(I think a contributing factor here may actually be the layout of the
integration test suite. Most of the test cases I've submitted require
multiple roles, but adding those to the current suite would get messy
quickly, since there's just a single root directory and single roles
directory for all integration tests. I think it'd be much cleaner to use
a subdirectory for each integration test, with a top-level playbook in
each, to keep all test files grouped together and avoid accidental
interactions with other files. That would also make it much simpler to
add people's test contributions.)

Anyway, I hope something can be done. As it stands now, I'm nervous
every time Ansible runs.

Dan Stillman

unread,

Sep 2, 2015, 12:31:56 PM9/2/15

to ansible...@googlegroups.com

I haven't received a response on this, but since I posted it, the
various variable precedence bugs I've reported have continued to
reappear and disappear through successive commits. Usually when one is
fixed, another is broken. One issue [1] was even closed as a
"misunderstanding" (despite the supposedly correct behavior making very
little sense) before being acknowledged as a bug (and then being fixed,
and then regressing again), suggesting that Ansible developers aren't
even clear on how variables _should_ work. Currently, a number of the
bugs (including a P1 bug that was previously fixed [2]) are present in
devel.

I've provided simple test cases with every bug report and suggested a
reorganization of the test suite that would allow them to be easily
incorporated. I don't see any point in continuing to report these — or,
honestly, in continuing to try to use Ansible — if no effort is made to
ensure that these dangerous bugs stay fixed.

[1] https://github.com/ansible/ansible/issues/11996
[2] https://github.com/ansible/ansible/issues/9497

Greg DeKoenigsberg

unread,

Sep 3, 2015, 10:11:57 AM9/3/15

to Ansible Project

On Wed, Sep 2, 2015 at 12:31 PM, Dan Stillman <dsti...@gmail.com> wrote:
> I haven't received a response on this, but since I posted it, the various
> variable precedence bugs I've reported have continued to reappear and
> disappear through successive commits. Usually when one is fixed, another is
> broken. One issue [1] was even closed as a "misunderstanding" (despite the
> supposedly correct behavior making very little sense) before being
> acknowledged as a bug (and then being fixed, and then regressing again),
> suggesting that Ansible developers aren't even clear on how variables
> _should_ work. Currently, a number of the bugs (including a P1 bug that was
> previously fixed [2]) are present in devel.
>
> I've provided simple test cases with every bug report and suggested a
> reorganization of the test suite that would allow them to be easily
> incorporated. I don't see any point in continuing to report these — or,
> honestly, in continuing to try to use Ansible — if no effort is made to
> ensure that these dangerous bugs stay fixed.
>
> [1] https://github.com/ansible/ansible/issues/11996
> [2] https://github.com/ansible/ansible/issues/9497

We definitely recognize the concerns about variable precedence. Most
users don't have much to worry about, but some users with very complex
playbook structures can run into challenging corner cases with
variable precedence -- corner cases that have become more acute with
Ansible's rapid and somewhat organic growth.

One of the main goals of Ansible 2.0 is to solve this exact class of
problem. In the particular case of variable precedence, we're pursuing
the following design goals:

1. Limit variable precedence handling to a single section of the
codebase. That makes it harder for weird assignment changes to sneak
in. You can find that code in the VariableManager class [1].

2. Ensure that variable precedence is documented in great detail. In
the past, some details of precedence have been less clear than we
would have liked, so we're firming that up [2]. We will continue to
iterate over this definition until we're satisfied that it's correct,
and then we will document it officially.

3. Ensure that variable precedence is rigorously tested. Remember that
2.0 is still in alpha, and regressions should be temporary, so long as
you help us by reporting them. We do have some unit and integration
tests and we are working on cleaning them up, and we welcome more
tests not covered already by existing cases.

4. Ensure compatibility moving forward. Once we have proper
documentation and testing for variable precedence rules, we will be
able to introduce changes with a strong guarantee that those changes
will not break compatibility.

There is one problem that we will not be able to solve for everyone:
in past versions of Ansible, variable precedence has been subtly
different in some cases from release to release. For the vast majority
of users, those differences won't be a problem -- but in setting the
proper precedence behavior, once and for all, we may end up biting
users who settled on a previous version of Ansible with different
variable precedence behaviors. This is why documenting the proper
behavior, and sticking to it, is such a high priority for us -- we
want to ensure that anyone who has to pay a cost for fixing these
issues will only have to pay that cost once.

If you see variable precedence breakage in the 2.0 codebase, please
report it in Github! We can't guarantee that we will have a fix for
your particular breakage in 2.0, but we can guarantee that we will be
able to tell you why it's broken, and what the proper behavior will be
moving forward.

We know that our continued success depends upon providing a dependable
and transparent codebase that can be useful for everyone from the
novice to the power user. That's what the push to Ansible 2.0 is all
about. Thanks for sticking with us as we cover the last remaining
ground.

[1] https://github.com/ansible/ansible/blob/aeff960d028644c19dd845e51ced14a9bd3709c5/lib/ansible/vars/__init__.py#L46
[2] https://github.com/bcoca/ansible/commit/06969d92b6c9e429defa9295ce78487df8a7d084

--g

> --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ansible-proje...@googlegroups.com.
> To post to this group, send email to ansible...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/55E72474.1020508%40gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.

--
Greg DeKoenigsberg
Ansible Community Guy

Find out why SD Times named Ansible
their #1 Company to Watch in 2015:
http://sdtimes.com/companies-watch-2015/

Dan Stillman

unread,

Sep 4, 2015, 12:10:06 AM9/4/15

to ansible...@googlegroups.com

On 9/3/15 10:11 AM, Greg DeKoenigsberg wrote:
> 3. Ensure that variable precedence is rigorously tested. Remember that
> 2.0 is still in alpha, and regressions should be temporary, so long as
> you help us by reporting them. We do have some unit and integration
> tests and we are working on cleaning them up, and we welcome more
> tests not covered already by existing cases.

Thanks for the response, Greg. This is the thing, though — I've provided
those tests, repeatedly, and they've never been used, even recently with
2.0. Developers have even asked me on GitHub whether bugs were still
present, which is kind of absurd when I've provided simple test cases. I
shouldn't have to run my own private test cases every time I try a new
version so that I can let the developers know if bugs have reappeared.

Here's an example of the kind of simple test cases I've provided:

https://github.com/ansible/ansible/issues/9498

That never should have been able to regress without developers noticing
(and given that the fix for the regression, two weeks ago, doesn't
appear to have been accompanied by any tests, I'm not particularly
confident that it won't again).

The existing variable precedence tests are clearly inadequate, and I
don't think there's any way they can be sufficient in the existing test
layout, with one variable precedence playbook and a handful of roles
mixed in haphazardly with all the other test files. I'd appreciate if
someone could comment on my suggestion for reorganizing the test suite
into separate directories, with individual test cases and all their
related files grouped together (which is how I'm testing these locally,
after all). It would make it trivial to integrate all the test cases
I've provided — I could even provide pull requests — and would help with
keeping track of separate issues and documenting the expected behavior.
I don't really see another solution here.

- Dan

Reply all

Reply to author

Forward