As I understand it, an integer is terminated by a byte that's most-significant bit is equal to zero. Thus, bytes must be read one at a time, and this condition must be checked after reading each one to determine whether to read another. Why was this encoding chosen over a variable-width encoding that would require at most two reads -- that is, an encoding that specifies the number of subsequent bytes to read in the first byte?
No, I don't mean for the first byte's value to be the length of the rest of the integer. Rather, the number of leading ones in the first byte could be the number of following bytes. This would still allow 7 bits of a value to be stored per byte, with the added bonus of a full 64-bit value being encoded in 9 bytes instead of 10.
Examples:
0 leading ones followed by a terminating zero and then 7 bits:
0b0.......
1 leading one followed by a terminating zero, then 6 bits, and then 1 byte:
0b10...... ........
7 leading ones followed by a terminating zero and then 7 bytes:
0b11111110 ........ ........ ........ ........ ........ ........ ........
8 leading ones followed by 8 bytes:
0b11111111 ........ ........ ........ ........ ........ ........ ........ ........
So, such an encoding is clearly possible. Why does Protocol Buffers use something different? Is this to provide some level of protection against dropped bytes? Has all of the data already been read into a buffer by the time that it is to be decoded, and so reducing the number of reads does not provide much of a speed boost?