gRPC-java seems slow with multi-level proto

612 views
Skip to first unread message

Avinash Dongre

unread,
Sep 24, 2016, 2:29:53 AM9/24/16
to grpc.io
Hi All,

Thanks all for replying/resolving my previous doubts. I am planning to use gRPC for our project. I have following question related to performance.

If I have following proto definition.

    option optimize_for = SPEED;    
    message
ScanRow {
        repeated bytes row  
= 1;
   
}
    message
ScanResult {
        repeated
ScanRow row  = 1;
   
}
    message
ScanRequest {
        int32 numOfColumns
= 1;
        int32 sizeOfEachColumn
= 2;
        int64 numOfRows
= 3;
        int32 batchSize
= 4;
   
}
    service
ScanService {
        rpc
Scan (ScanRequest) returns (stream ScanResult) {}
   
}

I get around 900-1000 MegaBytes/Seconds on Single Machine and across two different Physical Machines I am getting 100-110 MegaBytes/Seconds speed.

If I change above proto definition to following.

option optimize_for = SPEED;
message
ColumnValues {
 bytes columnName
= 1;
 bytes columnValue
= 2;
}
message
ScanRow {
 int64 rowId
= 1;
 int64 timeStamp
= 2;
 repeated
ColumnValues columnValue = 3;
}
message
ScanResult {
 repeated
ScanRow row = 1;
}
message
ScanRequest {
 int32 numOfColumns
= 1;
 int32 sizeOfEachColumn
= 2;
 int64 numOfRows
= 3;
 int32 batchSize
= 4;
}
service
ScanService {
 rpc
Scan (ScanRequest) returns (stream ScanResult) {}
}

Now I get around 130-135 MegaBytes/Seconds Speed.

Why it is slow with kind of multi-level proto files. Is there any Serialization/De-Serialization overhead with this ?

Thanks
Avinash

Avinash Dongre

unread,
Sep 24, 2016, 2:31:02 AM9/24/16
to grpc.io
>>> Now I get around 130-135 MegaBytes/Seconds Speed.
This result is on the Same Machine. i.e. gRPC Client and gRPC Servers are running on the same machine.

Avinash Dongre

unread,
Sep 26, 2016, 6:57:23 AM9/26/16
to grpc.io
Hi All,
Please help.

Thanks
Avinash

Louis Ryan

unread,
Sep 26, 2016, 12:38:47 PM9/26/16
to Avinash Dongre, grpc-io
I see that you're using a steaming RPC. Can you send the code from the server and client. How you interact with flow control can greatly affect performance

-louis (from phone)

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/cc74ecdf-3eac-4475-bad0-23fd66ee03b2%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Avinash Dongre

unread,
Oct 2, 2016, 6:16:41 AM10/2/16
to grpc.io, dongre....@gmail.com
Thanks Louis, For helping me here

Sorry for delayed response 

I am sure I am doing something wrong in my code, but could not figure out yet.

I have checked-in my benchmark project here at https://github.com/davinash/grpc-bench



With this above code for Raw I get following numbers

Num Of Rows     -> 2000000
Time            -> 68.468212235 Seconds
Total Data Size -> 65536000000
Data Rate       -> 912.8323635132227 MBps



And with formatted one I get following numbers.

Num Of Rows     -> 2000000
Time            -> 358.549532039 Seconds
Total Data Size -> 81952000000
Data Rate       -> 217.97546228982097 MBps


Thanks
Avinash
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

Louis Ryan

unread,
Oct 3, 2016, 7:18:50 PM10/3/16
to Avinash Dongre, grpc.io
I see you're using a blocking stub to make the streaming calls. I suggest you switch to using an async callback instead here


E.g.

asyncStub.formatedScan(ScanRequest.newBuilder()..., new StreamObserver() { ... public void onValue( ScanFormattedResponse response) { .... do stuff with response. i.e. replace the while loop in your code } });
.setNumOfColumns(NUMBER_OF_COLUMNS)
.setNumOfRows(NUM_OF_ROWS)
.setSizeOfEachColumn(SIZE_OF_EACH_COLUMN)
.build());

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

Avinash Dongre

unread,
Oct 4, 2016, 4:31:49 AM10/4/16
to grpc.io, dongre....@gmail.com
Hi Louis,

Thanks, I have made the changes and I see good numbers with async client. I have committed the change in my repo.

But I still see very low data transfer rate. What I have observed is that when I have big data say ( 1024 bytes ) Then I get rate on the same machine around 900-1000 MegaBytes/seconds.

Thanks
Avinash

Num Of Rows     -> 2000000
Total Data Size -> 512000000

Time            -> 7.484403756 Seconds
Data Rate       -> 65.20225470316008 MBps

Time            -> 5.776407626 Seconds
Data Rate       -> 84.48157256137519 MBps

Time            -> 6.010102214 Seconds
Data Rate       -> 81.19662239075524 MBps

Time            -> 5.401799312 Seconds
Data Rate       -> 90.34026845757057 MBps

Time            -> 5.352594307 Seconds
Data Rate       -> 91.17074300994655 MBps

Louis Ryan

unread,
Oct 4, 2016, 7:28:23 PM10/4/16
to Avinash Dongre, grpc.io
Have you tested the bandwidth between the two machines using Netperf ?

You handling of streaming using isReady/onReady is good, we have an improvement planned to add 'corking' https://github.com/grpc/grpc-java/issues/994 which can substantially improve throughput when dealing with large numbers of small messages in a stream.

One thing I notice between your two APIs is that in the more structure one you send the column name for every column value which seems quite inefficient

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

Avinash Dongre

unread,
Oct 4, 2016, 9:34:44 PM10/4/16
to grpc.io, dongre....@gmail.com
Thanks Louis,
Thanks for reviewing my code also , yes I am going to remove the need of column names per response.
The tests I am doing are on the same machine as of now.

Regarding corking mechanism, I see it is planned for 1.1, Do you have any idea when that will be available. ?

mean time, If I implemented my scanning RPC server method like flowControlledStreaming in AbstractBenchmark
Do you think that will help to get better throughput with large numbers of small messages in a stream ?

Thanks
Avinash
Reply all
Reply to author
Forward
0 new messages