Hi Yossi,
1. You are correct, we built replication specifically to enable horizontal scaling. (There will be at least one other replication mode in the future, designed for disaster recovery.)
2. Yes and yes -- you can add another replication cluster C on the fly without bringing down A and B. You can go through ELB and have it round-robin requests, but depending on your workload, you may see better performance by taking advantage of locality -- having a replication cluster close to each geographic location and accessing it directly.
3. For Transit specifically, anything that modifies a key is considered to be a write operation -- so creating a key, rotating it, or changing its configuration parameters. Everything else, including all encryption/decryption/signing/verification operations, can be handled on the secondaries. Note that you don't need to manually send write/read operations on different paths as in your diagram -- secondary clusters will transparently forward such write operations to the primary cluster!
4. That's all highly dependent on your setup -- whether you are using service discovery and how you have that hooked into ELB, health checks you have configured on the ELB, DNS, etc.
5. The Vault team does not generally perform benchmarks because it is so highly dependent on many factors -- not just workload against Vault, but which storage backend is used, the latency/connection to it, sizing of machines/instances, resources of machines/instances, provisioned IOPS (if applicable), tenancy, etc. In other words, as with most applications, benchmarks are not super useful in a general sense; what ends up being far more important is whether the speed you're getting is good enough for your use cases. (In one minor exception to the "we don't run benchmarks" rule, I did once run some benchmarks of Transit between two distinct machines in a highly, highly favorable test environment, and was able to push 37k operations per second. But to illustrate my earlier point, it's an environment that you would never encounter in real life, so it's likely only useful as an idealized upper bound.) That all said, our sales engineers work with customers on proof-of-concepts that I think include ensuring that performance meets needs. Since you're looking into replication you're looking into Vault Enterprise, so I do encourage getting in touch with them at
https://www.hashicorp.com/products/vault/
Best,
Jeff