I am assuming you are using gRPC Java:
* Setting the window size correctly will have the biggest win; it should be about equal to the bandwidth delay product BDP. 64K was picked as a generally safe guess, but it isn't correct in all environments. There is work to automatically tune this, but it isn't in today.
* If you have exactly 1 RPC active at a time, there are optimizations to make the DATA frames larger. (16K by default, set by the remote side settings). You can change this (though TBH, I have never tried and don't know how) to be larger so that each RPC fits in a single frame and doesn't need to be cut up.
* If you have more then 1 RPC active, each message is cut up into 1K chunks in order to make each RPC get more fair access to the wire. This was changed in master, and will be available in 1.5, but you can run with master to try it out. This ONLY helps if there are more than one active RPCs.
* If you are pushing more than 10gbps, you can run into TLS bottlenecks. This is almost certainly not applicable to most people, but you can create multiple channels to get around this, but you give up in order delivery. I would avoid doing this until it is the last possible thing.
What kind of bottlenecks do you see, and what are your target goals?