Greetings,
I've been having fun recently throwing AWS resources around in
Terraform! Generally it's working fine – at worst I've seen
cyclic dependencies in my change which I had to break into a
couple of steps, or a dependency breakage which fixed itself on a
second run. However, I have just run into one interesting case
which the tool doesn't seem to be able to automate its way out
of.
The case is this: I have a subnet that I want to delete and
recreate in a different VPC. Yet Terraform hangs and then errors
out, like this:
* aws_subnet.foo-b: Error deleting subnet: timeout while waiting
for state to become 'destroyed'
* aws_subnet.foo-a: Error deleting subnet: timeout while waiting
for state to become 'destroyed'
This recurs if I rerun the deploy.
"Hm," I thought, "I wonder what happens if I just go into my AWS
console and delete it manually?" I get this error:
The following subnets contain one or more instances or network
interfaces. You cannot delete these subnets until those instances
have been terminated, and the network interfaces have been
deleted.
In my case it's instances, and the instances in question are
associated with an autoscaling group. I am managing the ASG in
Terraform, but because the instances are automatically spun up by
scaling policies, they are not managed directly in Terraform. To
unblock Terraform, I think I will have to manually delete the ASG
and its instances before changing the subnet will work.
I'm not quite sure how to categorize this: crazy corner case,
expected but wonky behavior, a bug, a feature request, or what?
It seems to me that there might be a reasonable path for
automation here:
- Are any ASGs dependent on the subnet to be deleted (looking at
vpc_zone_identifier(s))?
- yes => Are those ASGs are also being deleted?
- yes =>
- "find all instances associated with those ASGs and terminate
them along with the ASG"
- delete the subnet
- no => block the operation because of a Terraform dependency
- no => proceed as usual
The difficulty, I think, is the operation in quotes – is it in
fact possible to roll up the ASG and all of its instances in the
same breath? If you just delete instances before deleting the
ASG, I imagine a policy might be able to spin up another one
before you can reap the ASG. If you delete the ASG first, I don't
know if you can correctly identify the instances that are OK to
reap.
If we can't actually manage the deletion in this case, but we can
recognize the problem, perhaps we could simply have Terraform
report the dependency issue rather than timing out?
Curious to hear your thoughts? I'm brand-new to Terraform and
new-ish to AWS, so maybe it's not as hard as I think it is, or
maybe it's a known limitation.
Thanks!
-- Owen