Google Cloud Endpoints now supports rate limiting -- a great way to prevent individual clients from swamping your backend, and an opportunity to control “how much” of your API any given client can have.
You can set simple quotas to limit overall number of requests per client, but you can also handle more complex cases. If you need to have different rates for different methods (for example, if you want to allow 100 read requests per minute but only 10 write requests), you can set different quota costs for different operations.
If you are using an Open API spec, the quotas are specified in your Open API specification file (using an extension). For gRPC, they are in your API Config yaml. And if you are using our Endpoints Frameworks for App Engine, they are annotations in the code.
Your spec sets a ‘default’ quota that will apply to all consumers of your API. If you want to raise (or lower) the quota for any particular client, you can do that in the Google Cloud Console.
There are a couple of limitations. Right now, quotas are applied per consuming project -- so you must be using API keys and generating those keys in different projects for them to take effect. They are specified and enforced per minute.
Let us know what you think!