How do you create cloudwatch alarms to reboot or recover instances?

Rajinder Singh

unread,

Dec 10, 2015, 1:31:55 PM12/10/15

to Terraform

We have a single instance NAT server. We want to create an alarm that checks the metric StatusCheckFailed_Instance and reboots the instance.

this is described in the documents below.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UsingAlarmActions.html#AddingRebootActions

I am able to define cloudwatch metric alarm

https://terraform.io/docs/providers/aws/r/cloudwatch_metric_alarm.html

but I am having trouble defining an action alarm that reboots the instance/

Documentations says this about alarm_actions.

alarm_actions - (Optional) The list of actions to execute when this alarm transitions into an ALARM state from any other state. Each action is specified as an Amazon Resource Number (ARN).

I also want to setup another alarm that checks for nat_system_check_failed and execute the action called "recover this instance"

Mostly like this is not possible in Terraform right now so I will use cloud formation but I wanted to be sure be I go that route.

Paul Hinze

unread,

Dec 15, 2015, 10:33:28 AM12/15/15

to terrafo...@googlegroups.com

Hi there,

It looks like there might be a bit of special casing required to get this working. Check out the Amazon API docs for the PutMetricAlarm action (which is what Terraform calls):

https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricAlarm.html

Note: You must create at least one stop, terminate, or reboot alarm using the Amazon EC2 or CloudWatch console to create the EC2ActionsAccess IAM role. After this IAM role is created, you can create stop, terminate, or reboot alarms using the CLI.

Those docs also give the format of the ARNs required to make this work.

Hope this helps!

Paul

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/terraform/issues
IRC: #terraform-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Terraform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to terraform-too...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/terraform-tool/c92e93fc-89ca-4cea-a6f1-c6baaa17c1c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ja...@fpcomplete.com

unread,

Dec 15, 2015, 10:40:37 AM12/15/15

to Terraform

On Thursday, December 10, 2015 at 1:31:55 PM UTC-5, Rajinder Singh wrote:

We have a single instance NAT server. We want to create an alarm that checks the metric StatusCheckFailed_Instance and reboots the instance.

Another way to slice the pie: single-instance Auto-Scaling Group, with a health metric and policy which re-creates the instance. You would need to use the AWS API to update the network routing table association on instance init/boot, but that is relatively simple and clear.

I started down this route but have not had a need to complete and put it to use.

Reply all

Reply to author

Forward