Terraform vs Ansible vs Cloudformation

9,410 views
Skip to first unread message

Rahul Mehrotra

unread,
Feb 4, 2015, 6:29:41 PM2/4/15
to terrafo...@googlegroups.com
Hi,
I am just getting started with my career in DevOps and have been researching of various technologies to automate infrastructure creation and deployment. 
Since I have been working on AWS a lot , my question is more focused on the same cloud provider.

I have been reading lots of blogs and articles where people are experimenting with different tools for infrastructure as a code in AWS.  I have seen lots of application where people are using Ansible and CloudFormation, Terraform and Ansible or just Ansible for coding, deployment and automating the infrastructure. 

I know Ansible has lots of built-in modules which is self-sufficient to create and orchestrate an infrastructure. I would like to know in general what makes you choose one or the other three choices above. 

Like when would you choose to go with an Terraform and Ansible pair instead of Just Ansible. 

What are the pros and cons of using Terraform and Ansible vs just using Terraform for everything 


Thank You :)

David Cunningham

unread,
Feb 4, 2015, 9:36:02 PM2/4/15
to Rahul Mehrotra, terrafo...@googlegroups.com
CAPS (Chef, Ansible, Puppet, Salt) are mainly for centrally controlling what lives inside a large number of instances.  I.e. processes, files, etc.

Terraform and CloudFormation are mainly for creating instances themselves (and other cloud resources like load balancers etc).  The main difference to me is that cloud formation is AWS only and is a hosted service, whereas Terraform supports many cloud providers and can run on your local machine.

I say "mainly" because both bleed a little into the other domain.  E.g. CAPS often have capabilities to create instances etc, although not with as much power as terraform / CF.

Likewise, one can use Terraform & CF to control the insides of instances at least at startup (via init scripts) and on GCE if you set up a metadata hook you can do it ongoingly as well.

--
You received this message because you are subscribed to the Google Groups "Terraform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to terraform-too...@googlegroups.com.
To post to this group, send email to terrafo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/terraform-tool/7230fa3f-156a-4851-98f3-9be057d7a668%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Cunningham

unread,
Feb 4, 2015, 9:44:22 PM2/4/15
to Rahul Mehrotra, terrafo...@googlegroups.com
Oh I forgot something very important: Terraform & CF know how to do the minimum amount of work to take the current state of your cloud resources into where you want them to go, including updating resources in place if possible.  I don't think any of the CAPS systems have that kind of sophistication, they just allow you to create / destroy.

Dave Dash

unread,
Feb 5, 2015, 12:27:29 AM2/5/15
to David Cunningham, Rahul Mehrotra, terrafo...@googlegroups.com
"Know how to do the minimum?"  I'd consider it a bug if ansible would provision something and do it in a way that wasn't efficient.

Meanwhile Terraform is still a growing piece of software.  There have been edge cases found in the last few months where Terraform would destroy an instance and then recreate it in order to make a minor change that didn't require a new instance.

I think Terraform definitely intends on doing the minimum.

I share Rahul's desire to know the answer to this.  I've used both Ansible and Terraform, and prefer to use Terraform for resource definitions and ansible (or salt) for instance configuration.  There is definitely an allure to have one tool that does both.



Rahul Mehrotra

unread,
Feb 5, 2015, 12:44:59 AM2/5/15
to terrafo...@googlegroups.com, dcu...@google.com, rhlm...@gmail.com
David the only benefit I have came across for terraform over ansible for provisioning is the in-place modification. Both of them have the same amount of modules for Amazon Web Services. Ansible Tower on other hand gives a nice GUI on top of Ansible. 

The main question still remains what use-case would one go for a Terraform-Ansible combo rather than just Ansible for infrastructure definitions ???

Also is the benefit from Terraform worth it for its learning curve and maintenance  ???

I wasn't able to find any information about the performance of terraform being a bit more efficient than Ansible in Cloud creation.


Thank You David and Dave for your valuable insights 
 




Rahul Mehrotra

unread,
Feb 5, 2015, 12:48:47 AM2/5/15
to terrafo...@googlegroups.com, dcu...@google.com, rhlm...@gmail.com
Dave, since you have been using terraform for a while with Ansible. Have you faced any scenarios where Ansible wouldnt have been able to perform certain task for the Infrastructure Creation for you which Terraform was able to do. 

Also were there any specific issues you face with Terraform that one should be aware of.

Thank You

Dave Dash

unread,
Feb 5, 2015, 1:40:34 AM2/5/15
to Rahul Mehrotra, terrafo...@googlegroups.com, dcu...@google.com
So I was introduced to both tools at around the same time.  At the time I was unaware that Ansible did more than orchestration and playbooks.  By then I had already been using Terraform to do resource definitions.

So I don't feel like I'm qualified to answer your first question.  I was working on a project with a small enough scope that I could use a shiny new tool (Terraform) which is something I don't usually have the luxury of doing.

The one thing I do like that I'm sure Ansible has, but it's not nearly as nice is the "plan" mode where you can see what's going to happen.

Terraform is written in Go and based on goamz (for the Amazon provider).  So there's a feature gap between what Amazon offers and what you can do in Terraform.  This is frustrating, but it's a reality of trying to follow an API.  I think Amazon releasing the Go SDK will help things along quite a bit though.

There's some regressions too, it's the cost of using software that is moving quickly.  You can see some of the issues I've had:


The reason I stick with Terraform is I like the direction it's going.  I really don't like to mix my tools.  I like Salt (or Ansible) for configuration and Terraform (or CF) to define my cloud.  I can see TF 1.0 being very clean and elegant and having enough coverage to make me happy.

-d

--
You received this message because you are subscribed to the Google Groups "Terraform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to terraform-too...@googlegroups.com.
To post to this group, send email to terrafo...@googlegroups.com.

Mitchell Hashimoto

unread,
Feb 5, 2015, 10:49:52 AM2/5/15
to Dave Dash, Rahul Mehrotra, <terraform-tool@googlegroups.com>, David Cunningham
I suppose its just about time for me to jump in here.

First, my disclaimer in this case is that Ansible is in fact the only
tool in the CM-space that I haven't used in a serious way, among the
"CAPS" group as David put it above. There isn't any specific reason I
haven't used it except that I haven't had the need personally. I don't
mean to judge the tool in any way based on this paragraph.

So now, Salt has cloud provisioning, and Chef and Puppet are starting
too as well. I'll share with you my reasons for creating Terraform vs.
augmenting those tools or working with those tools to make them good
cloud provisioning tools.

Given my disclaimer above, I don't know if Ansible does any of these
things. Here we go.

1.) For Puppet and Chef, I felt their agent-based model was just wrong
for this. Puppet/Chef are based on the original foundation that their
agent runs on the host that is also being changed. They've both made
strides to support _both_ models now (puppet apply, chef zero?), but
it is still awkward and doesn't feel 1st class yet. I know Ansible
doesn't do this.

2.) Parallelism in all tools is really subpar. Provisioning cloud
instances takes a good amount of time, and the parallelism didn't
exist when I started TF. Some do it now, but its still pretty
elementary. Terraform uses a dependency graph, and walks the leaves of
that graph in parallel to ensure maximum parallelizability it can.

3.) Gluing together multiple resources. With Terraform you can take
_any_ attribute of _any_ resources/module and use it as an input to
another resource/module. For example, you can say: get the IP address
of that load balancer and configure a DNS provider with it. This basic
functionality is surprisingly difficult for existing tools.

4.) Planning. Terraform has the industry's best plan feature, hands
down. Puppet/Salt have a "no-op" mode but it is different: it shows
you what it would do if you executed it in THAT moment. When you
execute those tools for real, the state of the world may have changed,
and they may do something different. Terraform generates a plan, which
can be saved, and you can tell Terraform to only do that plan. This is
critical for infrastructure moreso than a single server because you
want to see the full rollout effect of a single change: will this
change require a DNS change in the middle of the day when TTLs are too
long? Or will this change be done in-place?

5.) The core features of a CM tool don't align with a cloud
provisioning tool. I'm going to show a couple examples for this.

5a.) Every CM I know of "refreshes" the state of all its managed
resources on every run. It has to do this to check if it needs to make
any changes. On non-trivial systems, this already takes a significant
amount of CPU/time (dozens of seconds). For cloud systems, these are
all API calls over WAN, and even at 100 servers this takes many dozens
of seconds. As infrastructure scales, you get an O(n) slowdown in
apply times even without changes due to this.

With Terraform, we haven't exposed it yet, but we've designed the core
in a way that we support partial refresh. In this mode, Terraform will
only refresh resources that are highest priority (config change from
cached state, new resource, deleted resource), things that are _likely
to change_. For remaining resources, it will not refresh them, or it
will refresh them X% at a time (you choose X).

If you then run Terraform periodically, this will amortize the cost of
refreshes at the cost of some accuracy of state. But realistically
we've found that servers don't change that often.

5b.) CMs don't support more complicated lifecycle management. As an
example, Terraform already has "create before destroy" which says that
if a resource needs to be destroyed, Terraform should create the new
one first before destroying the old one. This allows really basic
features to minimize downtime. More lifecycle options are coming in
Terraform: rolling deploys, etc.

5c.) Incorrect destroy ordering. To be fair, Terraform still has some
issues with this, but they're edge cases. In things such as Puppet, if
you `ensure => absent` (to delete) resources, Puppet routinely
destroys them in an incorrect ordering, and cloud providers sometimes
don't allow that. For example, to delete a VPC, you need to make sure
all things using that VPC (subnets, route tables, instances, EIPs,
etc.) are all gone _first_. Terraform does this, other CMs don't,
because they haven't had to think about a destroy ordering ever
before.

5d.) Multi-team parallelism on subresources. This is coming in
Terraform in the future, but its another thing we thought about early
on that CMs don't do at all that people need. With Terraform, you want
to be able to allow multiple teams to modify the same infrastructure
at the same time, _safely_. To do this, Terraform will grab a
"sub-tree lock" on the resources that a plan says it will touch. If
another team member on another machine runs Terraform and the
sub-trees overlap, then it will error. But if they don't overlap,
parallel infrastructure change can happen.

6.) Atlas integration. I am mentioning this because Ansible has
Ansible Tower, a pure commercial offering, that they use as a value
add to their cloud provisioning. So, likewise, we have Atlas, in its
early stages that does the same thing.

6a.) Auto-scaling. Ansible does this with Tower, we do this with Atlas
(shipping in 4 to 6 weeks, as I said Atlas is in its infancy but this
was always planned).

6b.) ACLs on sub-resources. Due to Terraform's plan, Atlas will allow
you to define ACLs on resources and their attributes. So you can say
something like this: operator "John" can modify infrastructure only
between 5 PM and 9 AM, and can never modify the root DNS entries of
any of our domains in CloudFlare. "John" can also never destroy
resources X, Y, Z. And Atlas will verify this by checking the plan.

I hope this helps.

Best,
Mitchell
> https://groups.google.com/d/msgid/terraform-tool/CAOGuV3Ekpkp3YYTzArfGwqp_vFymaPmXPh6bOscQcZStx0zCJA%40mail.gmail.com.

Rahul Mehrotra

unread,
Feb 5, 2015, 2:29:43 PM2/5/15
to terrafo...@googlegroups.com, daved...@gmail.com, rhlm...@gmail.com, dcu...@google.com
Hi Mitchell,

Thank you for the valuable knowledge. 

I can definitely see how terraform stands  different from ansible in terms of infrastructure deployment. 
Also the upcoming features are something I am looking forward too. 

I have got enough knowledge from your discussion to help me decide on my tool set for my infrastructure. Thank you everyone. 

P.S: My intention was  never to compare the two products, each of them has its own pros and cons. Just wished to know the use-cases for best use of each tool. I guess it finally drills down to what you wish to achieve from your infrastructure.  

David Cunningham

unread,
Feb 5, 2015, 6:15:01 PM2/5/15
to Dave Dash, Rahul Mehrotra, terrafo...@googlegroups.com
"Know how to do the minimum?"  I'd consider it a bug if ansible would provision something and do it in a way that wasn't efficient.

Let's go to the fundamentals (modulo bugs, missing features, etc)

CAPS are based on cf-engine, whose internal mechanism is condition / response.  I can't find a link to it but I think it was Sean O'Meara (from Chef) who gave a great talk illustrating this point by showing how to make a CM system with a few lines of bash.  It is basically a set of "if (!what_you_want) { do something to rectify the situation }", which you run until convergence - a set of stabilizing feedback loops reminiscent of those that guide autopilots, thermostats, etc.  The inspiration for all of this is described in the original paper "computer immunology" written by the author of cfengine.  But the key thing is you have a list of conditions that you want and for each one you write some action of how to make those conditions true.  The systems differ in what language you write this in, and how the order is defined, etc., and they also each come with some opinions about configuration languages that you have to deal with.  But that's the general idea.

With Terraform/CloudFormation you write only what you want (not how to get it) as a giant single entity.  Terraform/CF then computes a diff and holistically decides what to do.  That is a far more sophisticated approach which requires an internal model of the resources it controls, how they inter-relate, and how they can be changed.  The end result is a qualitative improvement to the user experience that promotes thinking about things in a more high-level way, allowing you to handle more complex situations with less headache.  I would like to be able to manage the inside of instances in the same way, but this is quite difficult with conventional Linux distros.  Nix is an unconventional distro and does a good job of it (definitely worth a look if you've never heard of it).  Perhaps CoreOS will also help, being able to layer containers in an organized way.  But anyway, the key thing with Terraform/CF is that you just describe what you want.  If you want to change things, you merely update that description. and the system computes the perfect global update function.  It is not possible to write an Ansible module to do that, unless you reimplement Terraform inside your Ansible module.

As an aside, I'd like to argue against the use of "declarative" to describe condition/response systems like CAPS.  They are self-healing, but they are still fundamentally stateful.  Their state is a function of the history of things that have happened to them.  By analogy:  You get sick, your body makes antibodies, the sickness goes away, but you still have the antibodies.  Back to computers:  You install X, then decide you don't want it any more, so you delete it from your spec.  But there is no action to actually remove it.  You can easily get packages / files / processes / services remaining in existence because you didn't write a rule to explicitly destroy them.  You can add actions saying "if x is there then remove it", but in my experience that doesn't work very well because it can conflict with something else that is creating x in another part of the config (i.e. configs are no-longer composable).  The typical methodology is to blow away instances regularly in order to reset the state.  But then you're not managing the state of an instance, you've merely written a script for setting it up in the first place (which is trivial to do without CAPS).

On the other hand I think Terraform/CF and also Nix really are "declarative" as they really raising the abstraction level and taking care of low level actions in a transparent and reliable way.  In particular, the state they manage will always be a function of the current configuration, regardless of what the previous configurations were.

Jörg Maaß

unread,
Jan 11, 2018, 4:56:45 AM1/11/18
to Terraform
It is actually not true that CAPS cannot maintain state of your instances and only apply the deltas. I know for a fact (because I worked with it) that e.g. Chef can do that very well. For me, Terraform is really a "provisioner of provisioners", providing an overall control and automation structure over "lesser" CAPS systems. Terraform really shines when it comes to defining your infrastructure as a code, whereas CAPS shine when it comes to describing your individual instances as code.
Reply all
Reply to author
Forward
0 new messages