What is the most efficient protobuf type (in C++) for storing ipv4 or ipv6 address? My address is a boost::asio::ip::address_v4 (or v6)

2,696 views
Skip to first unread message

sanjana gupta

unread,
Jul 18, 2018, 3:16:39 AM7/18/18
to Protocol Buffers

I read that protobuf has a type called "bytes" which can store arbitrary number of bytes and is the equivalent of "C++ string". 

The reason why I don't prefer to use "bytes" is that it expects input as a C++ string i.e., boost IP will need to be converted to a string. 


Now my concern lies here : I want to perform serialize and send the encoded protobuf message over TCP socket. I want to ensure that the encoded message size is as small as possible.


Currently, I am using the below .proto file :

syntax = "proto2";

message profile

{

repeated uint32 localEndpoint = 1;

repeated uint32 remoteEndpoint = 2;

}


In order to save boost IP in the protobuf message, I am first converting boost IP into byte-format array by using "boost::asio::ip::address_v4::to_bytes()". So for a v4 IP, resultant array size is 8. Then I am converting 1st 4 bytes from the resultant byte-array into one uint32_t number and then storing in "localEndpoint" field of the protobuf message. Likewise, I repeat for the next 4 bytes. I am taking 4 bytes at a time so as to utilize full 32 bits of the uint32.


Hence for a v4 address, 2 occurrence of "localEndpoint" field is used. Similarly, for a v6 address, 4 occurrence of "localEndpoint" field is used.


Please allow me to highlight that if I had used "bytes" here, my input string itself would have been of size 15 bytes for a v4 ip like 111.111.111.111


Using uint32 instead of "bytes" does save me some encoded-data-size but I am looking for a more efficient protobuf type requiring lesser number of bytes.


Sorry for a long description but I wanted to explain my query in details. Please help me.. Thanks a lot in advance :)

Marc Gravell

unread,
Jul 18, 2018, 3:24:39 AM7/18/18
to sanjana gupta, Protocol Buffers
You should be able to encode ipv4 in 4 bytes, making fixed32 ideal, since you can avoid the length prefix. For ipv6, you're going to need 16 bytes, so "bytes" is probably your best bet, since it will only require a single header. You can then create a union of those:

    oneof ip_addr {
        fixed32 v4 = 1;
        bytes v6 = 2;
    }

That seems pretty optimal to me.

Marc

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

sanjana gupta

unread,
Jul 18, 2018, 4:36:00 AM7/18/18
to Protocol Buffers
Thanks Marc for your reply! By mistake I wrote 8-bytes for ipv4. I have corrected it as 4-bytes in my question. Thanks for that :)

One thing I want to ask you is that can I use "oneof" if my field "localEndpoint" is repeated?
I mean to say that my one "profile" message can have any among :
-> 2 ipv4 addresses
-> OR 2 ipv6 addresses
-> OR 1 ipv4 + 1 ipv6 addresses
in the field "localEndpoint".

So in that case, is "oneof" usage correct? I think that a "oneof" cannot be repeated. Please correct me if I am wrong.

Marc Gravell

unread,
Jul 18, 2018, 5:10:01 AM7/18/18
to sanjana gupta, Protocol Buffers
At that point I'd probably use "repeated bytes", then. It'll cost you an extra byte on v4 addresses, but it is simple. 

sanjana gupta

unread,
Jul 18, 2018, 5:28:10 AM7/18/18
to Protocol Buffers
Hey Marc,

I am sorry if I am repeating my words. Please enlighten me on this thing :

"bytes" requires me to give a std::string (c++) value as input. The problem with string for me is that a usual 4-byte ipv4 address like 111.111.111.111 becomes 15-bytes string "111.111.111.111"

Having said that, I feel discouraged to use bytes over uint32 (or int type for that matter) as it already increases the size of the data to be encoded. As a result, we can say that :
size of (encoded message using "bytes" type)  >  size of (encoded message using "int" type)

Is my understanding correct? Thanks!

Marc Gravell

unread,
Jul 18, 2018, 5:38:52 AM7/18/18
to sanjana gupta, Protocol Buffers
I would be amazed if there isn't a way of getting the raw underlying bytes from the IP address, rather than the text (ASCII). The address 111.111.111.111 is literally the 4 bytes with decimal values 111, 111, 111 and 111. No ASCII required.

sanjana gupta

unread,
Jul 18, 2018, 5:58:42 AM7/18/18
to Protocol Buffers
I would like to understand from the protobuf generated API point of view:

If I have,
repeated bytes localEndpoint = 1;

The generated file profile.pb.h shows me the following APIs to add+set values in it :

  // repeated bytes localEndpoint = 1;
  inline void add_localendpoint(const ::std::string& value);
  inline void add_localendpoint(const char* value);
  inline void add_localendpoint(const void* value, size_t size);

If I use any of them, perhaps my boost::asio::ip::address_v4 (111.111.111.111) needs to be converted to a std::string and then only I can use add_localendpoint() to set a value.
Am I correct?

If the conversion to string is done, isn't the size of protobuf message going to be more compared to the case of using 

repeated uint32 localEndpoint = 1;
?

Please let me know if I have confused you :) Thanks again!

Marc Gravell

unread,
Jul 18, 2018, 6:10:17 AM7/18/18
to sanjana....@gmail.com, Protocol Buffers
The first step, then, is to find how to get your ipv4 in a 4 byte std::string rather than a 15 byte std::string - I can't advise on that, but it should absolutely be possible. You don't need it as ASCII - you just want the bytes.

As for will it be longer? No, especially when compared to uint32; a 4-byte "bytes" will take 5 bytes to encode (1 byte for the length prefix, 4 bytes payload); the uint32 with value composed of 4 bytes with value 111 will be varint-encoded, so will be 5 bytes (since the penultimate high bit is set); in fact, anything with the most-significant byte having value 16 or above will require 5 bytes to encode as uint32. Now, you *could* encode an ipv4 using "fixed32", which will always take 4 bytes, but... if you do that, you can't conveniently store ipv6 in the same field. So: since you mention needing to store both ipv4 and ipv6, "repeated bytes" is your simplest option. And as above: it isn't any more expensive than "uint32" (for most IPs).
Regards,

Marc

sanjana gupta

unread,
Jul 18, 2018, 7:04:04 AM7/18/18
to Marc Gravell, Protocol Buffers
Thanks Marc, I am gonna try fixed32.

I will update on my findings.

Paul Marks

unread,
Jul 25, 2018, 3:03:51 PM7/25/18
to Protocol Buffers
Use a bytes field: 4 for IPv4, 16 for IPv6.

Ideally, you should use an IP address library with a "packed bytes as std::string" input/output.
The in_addr/in6_addr types are stored as 4 or 16 bytes in RAM.

sanjana gupta

unread,
Jul 31, 2018, 9:10:27 AM7/31/18
to pma...@google.com, Marc Gravell, Protocol Buffers
Hello Marc,

I wanted to let you know that I tried using fixed32 and fixed64 protobuf types and it has helped me save quite some bytes on the encoded data size. Allow me to show the protobuf message I created which is capable for storing one or multiple v4/v6 IPs :

message IpAddress
{
    message IpAddressTypes
    {
        message V6Type
        {
            repeated fixed64 v6IpAddress = 1 [packed=true];
        }
        oneof IpTypes
        {
            fixed32 v4Ip = 1;
            V6Type v6Ip = 2;
        }
    }
    repeated IpAddressTypes ipAddress = 1;
}

Thanks for your suggestion of fixed32/fixed64 types and also highlighting about "oneof".
Please correct me if you find anything wrong with my protobuf message.

--

Paul Marks

unread,
Jul 31, 2018, 1:46:22 PM7/31/18
to Protocol Buffers
On Tuesday, July 31, 2018 at 6:10:27 AM UTC-7, sanjana gupta wrote:
Hello Marc,

I wanted to let you know that I tried using fixed32 and fixed64 protobuf types and it has helped me save quite some bytes on the encoded data size. Allow me to show the protobuf message I created which is capable for storing one or multiple v4/v6 IPs :


All you should need is:
  bytes ip = 1;  // length is 4 for IPv4, 16 for IPv6.

Reply all
Reply to author
Forward
0 new messages