Our project is running on the Google App Engine standard environment with auto-scaling configured as mentioned below. Warm up requests are enabled in the app and we are using Google Endpoints service. However, I am facing a latency issue in the different scenarios. Environment: Java 8, Instance type: F4_1G Configuration for autoscaling: min-instances: 2 max-concurrent-requests: 80 min-pending-latency: 6s max-pending-latency: 10s
I tested with JMeter with configuration of sending 85 asynchronous requests with a ramp up period of 10 seconds. From the application logs I can notice that appengine takes a long time to serve the request.Below are the questions I have
1.Most of the requests are failing because of time exceed. In image 1, we can spot that the request takes 88.2 seconds. I know that AppEngine auto scaling has a 60 seconds request timeout limit. But we have configured autoscaling with a minimum 2 instances and there is no restriction for max-instance. The AppEngine Instance should handle the request otherwise AppEngine should scale up to handle the request. Why is it not happening?