Dennis,
Essentially it sounds like you're describing exactly what Amazon Launch
Configurations (LC) and Auto Scaling Groups (ASG) are meant to do with
the launching of new instances based on triggered events. You either can
use the CloudWatch metrics as triggers or your own processes that
trigger an event through an SNS/SQS queue to increase or decrease the
number of nodes running. Which could pretty much covers all of this with
some caveats for C.
The trick, or sticky part to use your phrasing, is the certificates and
the dynamic hostnames used within Amazon. You can work around this by
utilizing an autosigning script similar to what I have in my environment
[1]. To use it requires utilizing CSR attributes [2]. This allows you to
automatically sign the CSR that comes in for valid instances. The next
trick is to be able to designate how this host is supposed to be
configured. For this I look for Facts that I can use within my Hiera
configuration on the Master side of things and then use server roles.
It's not hard to be able to drop custom facts on instances
initialization and than be able to use those on the Agent run to
determine this hosts purpose and role in life.
Everything I need to do this in Amazon is made very easy with the use
of cloud-init to be able to setup the required configuration as
user-data that can be included in the LC for the ASG. You just need to
customize the user-data for each LC for different server roles, the ASG
then handles the scaling of the cluster of nodes using the assigned LC.
You don't have the instances just sitting around if the load doesn't
justify it, but when the triggered events occur it can go up or down as
defined by your policy in the ASG. I believe this accomplishes the end
state goal of what you're looking for and is very possible.
1.
https://github.com/UGNS/standard-modules/blob/production/scripts/autosigner.rb
2.
https://docs.puppetlabs.com/puppet/latest/reference/ssl_attributes_extensions.html