Hey Evan,
I can't necessarily relate any experience with writing a custom auto-scaler, but I have some pieces of advice which might help with broad overview of possibilities.
One service to look into leveraging is
Cloud Monitoring, which allows you to monitor all kinds of metrics related to your application, from CPU usage to request latency (using
custom metrics). You can use
webhook notifications to notify your control instance(s) to spin up / down instances according to upper / lower bounds on relevant metrics. There's also the possibility of using the
Compute Engine Autoscaler to manage the instances by grouping them, which also has support for use of
custom metrics.
Overall, the more granular your desired control of scaling, the closer you get to writing entirely custom control logic in an instance, although there are several scaling / monitoring solutions which exist already. Finally, your Feature Request will be responded-to shortly and if implemented, would make much of this superfluous, but you might quite enjoy having the sort of granular control that you described coding. The main problems to be solved are accurate monitoring / projection of available resources vs resources required, and the technical question of how to implement control of scaling in line with that.
I hope this has helped turn some wheels, let me know if you have any further questions, and hopefully others with more practical experience could chime in?
Regards,
Nick
Cloud Platform Community Support