A couple of my own thoughts:
Single GenServer (or any process for that matter) runs concurrently to others, but is sequential internally. This is a property. The consequences are that actions are serialized (which is good), but a single process can become bottleneck.
If many processes depend on a single one, then that process will be a bottleneck, and the system might not scale well.
There are some ways around it:
1. If actions need not be serialized, run them in different processes.
2. If writes must be serialized, but not reads, use ETS, and synchronize only writes (this is essentially what Martin mentioned).
3. If actions must be serialized, consider whether the entire action needs serializing or just some part of it. If latter, move code that can run concurrently out of the process. Also, try optimizing the code that must be serialized.
Multiple gen_servers (or processes) are of course scalable. If you have thousands of processes that are mostly independent. A napkin diagram showing inter-process dependencies will immediately present possible bottlenecks. This is why processes are great - you can easily reason about the concurrency of your entire system.
Regarding calls vs cast, I prefer to use casts, unless response is needed. Calls are performance and scalability killers, and potential deadlock sources. Notice, that sometimes you may want to use call to return the success of a write. However, if you treat success/fail operation equally, then just use casts. Making some custom scheme of turning a call into cast, only to send back the message later is just reinventing a wheel. However, you may want to issue an operation, then do something else, and pick up the response later. In this scenario, you should check xgen's tasks.
I strongly disagree that calls are good tool for backpressure control (despite seeing this pattern being mentioned). It is a hacky, and implicit way of limiting a client, and can't help in all situations (e.g. many clients attacking a single server). Furthermore, timeout will not remove the message from the queue.
For explicit backpressure load management, I use a middle-man process. This is formulated in a library called workex (
https://github.com/sasa1977/workex) which gave me good results in production. Having a middle man process induces some performance penalty, but it gives you a control over your message queue. You can prioritize, bundle, and discard messages as you please. There is also a popular jobs library by Ulf Wiger (
https://github.com/esl/jobs). I didn't use it, but given Ulf's reputation, I'd trust it to be good, most probably better then what I wrote.