Hello, hope everything is well with you.
I am currently using grpc for communication of LLMs (large language models) having 1 to 7 billion parameters. I know that there is a 2 GB serialization limit and that is why I have used chunking. So basically in my chunking, let's say I have a model having 100 layers, where each layer is less than 2 GB. I basically send `batch_size` amount of layers in a single go. For example, let's say I can send 5 layers in a single go so I will need 20 rounds to communicate the whole 100 layers. Also, this chunking I am doing is using python and not grpc.
But right now I have a model which has a single layer itself which has a size greater than 2 GB so in that case I am not sure how to proceed.
Can anyone please give me some info on how can I leverage grpc chunking for this issue?
Kind regards,
Saurav