Powering EC2 instances on/off

718 views
Skip to first unread message

C. S.

unread,
Apr 21, 2014, 12:29:03 PM4/21/14
to ansible...@googlegroups.com
Hi folks,

We’re trying to implement a system where we can power environments on and off AWS when they’re not in use. However the ec2 inventory module excludes instances that are not in a running state. It seems like adding an option to the ec2 module to include stopped instances would work, but then I guess ansible would need a corresponding option to call the module with to include the stopped instances. Which seems a it hacky…

Maybe ansible needs a notion of host state? Any thoughts?

Thx!

-cs

Scott Anderson

unread,
Apr 21, 2014, 1:10:38 PM4/21/14
to ansible...@googlegroups.com
I use this module: https://github.com/ansible/ansible/pull/6349

Full disclosure: Michael believes all inventory should be done via inventory scripts; I respectfully disagree. :-) I find ec2.py to be very slow (20 seconds to refresh the cache with a small number of instances, for example) and prefer querying inventory directly in the script itself for many use cases.

Regards,
-scott

C. S.

unread,
Apr 21, 2014, 1:39:20 PM4/21/14
to ansible...@googlegroups.com
Thanks!

That’s interesting, your module is the same as ec2_facts just with filtering. And the ec2_facts module says it may add filtering in the notes. I think I’d agree with Michael’s pov, but it looks like we’ve already gone down facts being outside the inventory module, so maybe a pull request against ec2_facts with the filters would get accepted. Long run it does seem like hosts and modules need to have some idea of state … 


--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/e10fe6c9-e2b9-4ddb-9797-71c0602ea7c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Scott Anderson

unread,
Apr 21, 2014, 1:45:22 PM4/21/14
to ansible...@googlegroups.com
Actually, it’s not the same as ec2_facts other than it returns facts about an instance.

ec2_facts only works when run on an actual AWS instance (it calls the Amazon ec2 metadata servers) and it only retrieves the facts for that instance alone.

ec2_instance_facts, on the other hand, can retrieve multiple instance facts at once from anywhere (I use it in a local action). It’s more like ec2.py run for specific instances from within a playbook.

Regards,
-scott

You received this message because you are subscribed to a topic in the Google Groups "Ansible Project" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ansible-project/mwLbfIe-TBA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ansible-proje...@googlegroups.com.

To post to this group, send email to ansible...@googlegroups.com.

C. S.

unread,
Apr 21, 2014, 1:54:39 PM4/21/14
to ansible...@googlegroups.com
Thanks for the clarification, right, the use case and implementation are a bit different.  Seems like they could be combined however.

Michael DeHaan

unread,
Apr 23, 2014, 9:04:46 AM4/23/14
to ansible...@googlegroups.com
"so maybe a pull request against ec2_facts with the filters would get accepted. Long run it does seem like hosts and modules need to have some idea of state … "

Anything applying to more than one host definitely shouldn't be done by the facts module.




ghe...@gmail.com

unread,
Apr 25, 2014, 2:31:59 PM4/25/14
to ansible...@googlegroups.com
  So, I'm curious, for the case where you want to start "stopped" EC2 instances, what's the current recommended approach?  

  I've kind of ignored this task for now, managing that by hand (it's just our dev env, but it's still a couple of dozen instances at least).  I'm almost about to pull Scott's branch in locally since it looks so much better than manual management.

Scott Anderson

unread,
Apr 25, 2014, 2:59:23 PM4/25/14
to ansible...@googlegroups.com

On Apr 25, 2014, at 2:31 PM, ghe...@gmail.com wrote:

  So, I'm curious, for the case where you want to start "stopped" EC2 instances, what's the current recommended approach?  

  I've kind of ignored this task for now, managing that by hand (it's just our dev env, but it's still a couple of dozen instances at least).  I'm almost about to pull Scott's branch in locally since it looks so much better than manual management.


Here’s an example in case you do use ec2_instance_facts. This example creates maintenance instances for updating AMIs.

Notes:
    * This is part of a set of scripts that will create an entire load balanced application environment (including DNS, VPC, centralized logging, and RDS) in a bare AWS account in about 20-30 minutes.
    * app_environment is dev, test, stage, or prod. The scripts will create the same setup in each environment with some differences such as RDS size, domain name, and so forth. 
    * I use a naming convention for AWS resources of ‘<product>-<environment>-<AWS type>-<purpose>’, eg. foo-stage-ec2-logging or foo-prod-ami-web.

# The base image is created from a standard Ubuntu LTS instance. Then, packages common to all
# of the images (eg. security, ansible, boto, etc.) are installed and configured.

# There’s a separate pull request (also rejected, hi Michael... ;-) for the ec2_ami_facts module.
- name: Obtain list of existing AMIs
  local_action:
    module: ec2_ami_facts
    description: "{{ ami_image_name }}"
    tags:
      environment: "{{ app_environment }}"
    region: "{{ vpc_region }}"
    aws_access_key: "{{ aws_access_key }}"
    aws_secret_key: "{{ aws_secret_key }}"
  register: ami_facts
  ignore_errors: yes

# If a version of the AMI exists, record this. Otherwise use the base Ubuntu image.
- set_fact:
    environment_base_image_id: "{{ ami_facts.images[0].id }}"
  when: ami_facts.images|count > 0
- set_fact:
    environment_base_image_id: "{{ ami_base_image_id }}"
  when: ami_facts.images|count == 0
    
# See if the maintenance image for this image type for this environment is running.    
- name: Obtain list of existing instances
  local_action:
    module: ec2_instance_facts
    name: "{{ ami_maint_instance_name }}”
    # Everything but terminated
    states:
      - pending
      - running
      - shutting-down
      - stopped
      - stopping
    tags:
      environment: "{{ app_environment }}"
    region: "{{ vpc_region }}"
    aws_access_key: "{{ aws_access_key }}"
    aws_secret_key: "{{ aws_secret_key }}"
  register: instance_facts
  ignore_errors: yes

- set_fact:
    environment_maint_instance: "{{ instance_facts.instances_by_name.get(ami_maint_instance_name) }}"
  when: instance_facts.instances|count > 0

# If there is no such instance, create one.
- name: Create an instance for managing the AMI creation
  local_action:
    module: ec2
    state: present
    image: "{{ environment_base_image_id }}"
    instance_type: t1.micro
    group: "{{ environment_public_ssh_security_group }}"
    instance_tags:
      Name: "{{ ami_maint_instance_name }}"
      environment: "{{ app_environment }}"
    key_name: "{{ environment_public_ssh_key_name }}"
    vpc_subnet_id: "{{ environment_vpc_public_subnet_az1_id }}"
    assign_public_ip: yes
    wait: yes
    wait_timeout: 600
    region: "{{ vpc_region }}"
    aws_access_key: "{{ aws_access_key }}"
    aws_secret_key: "{{ aws_secret_key }}"
  register: maint_instance
  when: environment_maint_instance is not defined

- set_fact:
    environment_maint_instance: "{{ maint_instance.instances[0] }}"
  when: maint_instance is defined and maint_instance.instances|count > 0

- name: Ensure instance is running
  local_action:
    module: ec2
    state: running
    instance_ids: "{{ environment_maint_instance.id }}"
    wait: yes
    wait_timeout: 600
    region: "{{ vpc_region }}"
    aws_access_key: "{{ aws_access_key }}"
    aws_secret_key: "{{ aws_secret_key }}"
  register: maint_instance
  when: environment_maint_instance is defined

# If we had to start the instance then the public IP will not have been defined when
# we gathered facts above, so get it again.
- name: Obtain public IP of newly running instance
  local_action:
    module: ec2_instance_facts
    name: "{{ ami_maint_instance_name }}"
    states:
      - running
    tags:
      environment: "{{ app_environment }}"
    region: "{{ vpc_region }}"
    aws_access_key: "{{ aws_access_key }}"
    aws_secret_key: "{{ aws_secret_key }}"
  register: instance_facts
  when: maint_instance|changed

- set_fact:
    environment_maint_instance: "{{ instance_facts.instances_by_name.get(ami_maint_instance_name) }}"
  when: maint_instance|changed

# Pass the collected facts on the new maintenance image host for configuration by role.
- name: Add new maintentance instance to host group
  local_action:
    module: add_host
    hostname: "{{ environment_maint_instance.public_ip }}"
    groupname: maint_instance
    app_environment: "{{ app_environment }}"
    # This passes the new/existing private key file to ansible for use in contacting the hosts. Better way to do this?
    ansible_ssh_private_key_file: "{{ environment_public_ssh_private_key_file }}"
    environment_maint_instance: "{{ environment_maint_instance }}"

- name: Wait for SSH on maintenance host
  local_action:
    module: wait_for
    host: "{{ environment_maint_instance.public_ip }}"
    port: 22
    # This is annoying as Hades. Sometimes the delay works, sometimes it's not enough.
    # The check fails if the port is open but the ssh daemon isn't yet ready to accept
    # actual traffic, right after the maintenance instance is started.
    #delay: 10
    timeout: 320
    state: started

# TODO fix the hardcoded user too
- name: Really wait for SSH on maintenance host
  local_action: command ssh -o StrictHostKeyChecking=no -i {{ environment_public_ssh_private_key_file }} ubuntu@{{ environment_maint_instance.public_ip }} echo Rhubarb
  register: result
  until: result.rc  == 0
  retries: 20
  delay: 10

Regards,
-scott

James Carroll

unread,
Apr 25, 2014, 3:08:07 PM4/25/14
to ansible...@googlegroups.com
I'm fairly new to Ansible.  How do I get your code into my Ansible install so I can use it?  I run from source.

Thanks!

James




On Monday, April 21, 2014 12:29:03 PM UTC-4, cove_s wrote:

__________________________________________________________________
The information contained in this email message and any attachment may be privileged, confidential, proprietary or otherwise protected from disclosure. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, copying or use of this message and any attachment is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and permanently delete it from your computer and destroy any printout thereof.

Scott Anderson

unread,
Apr 25, 2014, 3:10:07 PM4/25/14
to ansible...@googlegroups.com

On Apr 25, 2014, at 3:08 PM, James Carroll <james....@idmworks.com> wrote:

> I'm fairly new to Ansible. How do I get your code into my Ansible install so I can use it? I run from source.

I keep all of my new/modified modules in a library directory under where my play books are. Ansible will find the libraries there and use them over the ones in the Ansible install.

Regards,
-scott

Michael DeHaan

unread,
Apr 25, 2014, 4:57:54 PM4/25/14
to ansible...@googlegroups.com
Using local ./library content is fine, but please don't run a fork with extra packages added if you are going to ask questions about them -- or at least identify that you are when you do.

It can make Q&A very confusing when people ask about things that aren't merged.


Gustavo Hexsel

unread,
Apr 25, 2014, 7:17:16 PM4/25/14
to ansible...@googlegroups.com
Just as a side-note, I was able to get the wait_for mode to work for ssh with a bit of fiddling (so you don't have to wait with 2 tasks):

- hosts: 127.0.0.1
  connection: local
  gather_facts: false
  vars_files:
    - env.yaml
  tasks:
    - name: Wait for SSH to come up after the reboot
      wait_for: host={{item}} port=22 delay=60 timeout=90 state=started
      with_items: groups.tag_env_{{pod}}_
      ignore_errors: yes
      register: result
      until: result.failed is not defined
      retries: 5


This seems to work for me all the time, but maybe I just got lucky.  I create groups based on tags.  Groups are tag-based "class_database": "", "class_monitoring": "", "env_qa1": "", which I register using add_host.



--
You received this message because you are subscribed to a topic in the Google Groups "Ansible Project" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ansible-project/mwLbfIe-TBA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.

Michael DeHaan

unread,
Apr 26, 2014, 8:36:00 AM4/26/14
to ansible...@googlegroups.com
I'm always a bit wary when so many keywords come together.   It's usually the sign something can be simplified and is not "Ansible-like" enough.

   - name: Wait for SSH to come up after the reboot
      wait_for: host={{item}} port=22 delay=60 timeout=90 state=started
      with_items: groups.tag_env_{{pod}}_
      ignore_errors: yes
      register: result
      until: result.failed is not defined
      retries: 5

Can likely be simplified to:

  - hosts: localhost
    tasks:
       - ec2: # provisioning step here with add_host...

  - hosts: groups.tag_env_{{ pod }}_
    tasks:
      - name: Wait for SSH to come up after the reboot
        local_action: wait_for host={{item}} port=22 delay=60 timeout=90

   
A few key concepts:

(A) Using the host loop is clearer than doing a "with_items" across the group

(B) You should only need to do one wait_for.  Consider increasing the timeout rather than looping over a retry

(C) You should not need to register the result of the retry since there is no loop

(D) You won't need to ignore errors because we're running wait_for off localhost, which we know we can connect to.




--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.

To post to this group, send email to ansible...@googlegroups.com.

Gustavo Hexsel

unread,
Apr 26, 2014, 1:32:44 PM4/26/14
to ansible...@googlegroups.com
Then I can consider this a bug report.  Without retries, wait_for fails for every EC2 AMI I tried (admitedly, they're all variations of CentOS).  

Things I've seen: 
- it reports port open, then refuses to connect
- it reports times out even though I was able to manually log in prior to the timeout
- it fails with ssh errors while checking the port (this one is a bit rare)

This combination is less than ideal, but it seemed to work for all my cases.  Also, a minor thing, you have an ec2 task then you start using the groups.tag_xxx, is it implied you have an add_host there?  Cause my ec2 instances won't appear unless I add that.


Gustavo Hexsel

unread,
Apr 26, 2014, 1:33:44 PM4/26/14
to ansible...@googlegroups.com
Nvm, saw the add_host in the comment.
Reply all
Reply to author
Forward
0 new messages