ec2_group removed egress rules for internal IPs

39 views

Skip to first unread message

Gregory Spranger

unread,

Jul 7, 2015, 11:18:35 AM7/7/15

to ansible...@googlegroups.com

hi there,

so we had something odd happen to us and i figured i would reach out to the community for help .. so the background:

1) ansible 1.9.1

1) we have a YML data set that contains "normal" info for SGs like ingress and egress rules

2) this data rarely changes

3) we execute the ec2_group command on a regular basis since it is part of our "normal ansible runs" (4 times a day)

so this is where it starts to get "weird" .. our SGs show "changed" on a regular basis, since we do NOT allow ALL to cidr_ip 0.0.0.0/0 for egress rules .. in the ansible code, i *THINK* you will see that this is part of the default data: https://github.com/ansible/ansible-modules-core/blob/devel/cloud/amazon/ec2_group.py (lines ~305 ?? possibly 409ish ??).. so since we don't allow this egress rule, ansible thinks the SG has "changed" when in fact it has not -- and issues a "changed" command on every run .. for example:

changed: [localhost] => (item={'rules': [{'to_port': 5666, 'from_port': 5666, 'group_name': '1-admin', 'proto': 'tcp'}, ...], 'rules_egress': [{'to_port': 'all', 'from_port': 'all', 'cidr_ip': '10.137.0.0/16', 'proto': 'all'}, ...], 'name': '1-base-zero', 'description': 'Default global SG to be attached to all EC2 instances'})

this is kind of not cool since it shows as changed when it has not .. but since no real change happened, "AWS Config" does not view it as a change .. i think there is a feature idea out there for this: https://github.com/ansible/ansible/issues/11249

so then .. here is what happened .. we were minding our own business ansible ran at 10AM, it did it's normal "SG business" -- no issues .. it ran again at 12PM, and BAM !! egress rules from an important SG (<< prolly our most important SG) were removed .. what is even more odd is that the output of the run that was successful at 10AM was identical to the output of the run at 12PM .. the same "changed" output i alluded to earlier ..

but "AWS Config" revealed all kinds of nastiness .. it showed that we did this:

i will say the one "odd" thing we do that stands out in my mind is we do this as part of our data set for egress rules:

- proto: icmp

from_port: -1

to_port: -1

cidr_ip: "0.0.0.0/0"

other than a few comments we add in to the array, it all is pretty normal ..

sooooo .. any ideas about WTF happened ?? we are reaching out to AWS for support as well but no info to share there yet ..

thanks for any help you can offer ..

NOTE: this has only happened once like in 50+ executions .. we re-ran the exact same ansible play to fix what was broken .. so that even adds more weirdness to it

Gregory Spranger

unread,

Jul 7, 2015, 11:32:43 AM7/7/15

to ansible...@googlegroups.com

just some more info .. for the SG that "failed" -- we have 15 ingress rules and 20 egress rules .. not sure if that matters ..