JSON Serialization Performance

1,107 views
Skip to first unread message

Edward Clark

unread,
Mar 22, 2018, 11:23:54 AM3/22/18
to Protocol Buffers
Howdy,

I'm working on a project that recently needed to insert data represented by protobufs into elasticsearch.  Using the built in JSON serialization we were able to quickly get data into elasticsearch, however, the JSON serialization seems to be rather slow when compared to generating with a library like rapidjson.  Is this expected or is a likely we're doing something wrong?  Below is info on what we're using, and relative serialization performance results.  Surprisingly, rapidjson serialization was faster than protobufs binary serialization in some cases, which leads me to believe I'm doing something wrong.

Ubuntu 16.04
GCC 7.3, std=c++17, libstdc++11 string api
Protobuf 3.5.1.1 compiled with -O3, proto3 syntax

I've measure the performance of 3 cases, serializing the protobuf to binary, serializing the protobuf to JSON via MessageToJSONString, and building a rapidjson::Document from the protobuf and then serializing that to JSON.  All tests use the same message with different portions of the message populated, 100,000 iterations.  The json generated from the protobuf and rapidjson match exactly.

Test 1, a single string field populated.
proto binary: 0.01s
proto json:    0.50s
rapidjson:     0.02s

Test 2, 1 top level string field, 1 nested object with 3 more string fields.
proto binary: 0.02s
proto json:    1.06s
rapidjson:     0.05s

Test 3, 2 string fields, and 1 ::google::protobuf::ListValue containing doubles of the format, [[[double, double], [double, double], ...]], 36 pairs of doubles total.
proto binary: 1.50s
proto json:    8.87s
rapidjson:     0.41s

Protobuf binary serialization code:
    std::string toJSON(Message const& msg) { return msg.SerializeAsString(); }

Protobuf json serialization code:
    std::string toJSON(Message const& msg) { return msg.SerializeAsString(); }
        std::string json;
        ::google::protobuf::util::MessageToJsonString(msg, std::addressof(json));
        return json;
    }

Rapidjson serialization code:
    // It's a lengthy section of code manually populating the document.  Of note, empty strings and numbers set to 0 are omitted from the JSON as the protobuf does.  The resulting JSON is exactly the same as the protobuf json.

Any info on how to improve the protobuf to JSON serialization would be greatly appreciated! 

Thanks,
Ed

Feng Xiao

unread,
Mar 22, 2018, 6:45:41 PM3/22/18
to Edward Clark, Protocol Buffers
On Thu, Mar 22, 2018 at 8:23 AM, Edward Clark <ebcl...@gmail.com> wrote:
Howdy,

I'm working on a project that recently needed to insert data represented by protobufs into elasticsearch.  Using the built in JSON serialization we were able to quickly get data into elasticsearch, however, the JSON serialization seems to be rather slow when compared to generating with a library like rapidjson.  Is this expected or is a likely we're doing something wrong?
It's expected for proto-to-JSON conversion to be slower (and likely much slower) than a dedicated JSON library converting objects designed to represent JSON objects to JSON. It's like comparing a library that converts rapidjson::Document to protobuf binary format against protobuf binary serialization. The latter is definitely going to be faster no matter how you optimize the former. Proto objects are just not designed to be efficiently converted to JSON.

There are ways to improve the proto to JSON conversion though, but at the end of day it won't going to beat proto to proto binary serialization so usually performance sensitive services will just support proto binary format instead. 
 
Below is info on what we're using, and relative serialization performance results.  Surprisingly, rapidjson serialization was faster than protobufs binary serialization in some cases, which leads me to believe I'm doing something wrong.

Ubuntu 16.04
GCC 7.3, std=c++17, libstdc++11 string api
Protobuf 3.5.1.1 compiled with -O3, proto3 syntax

I've measure the performance of 3 cases, serializing the protobuf to binary, serializing the protobuf to JSON via MessageToJSONString, and building a rapidjson::Document from the protobuf and then serializing that to JSON.  All tests use the same message with different portions of the message populated, 100,000 iterations.  The json generated from the protobuf and rapidjson match exactly.

Test 1, a single string field populated.
proto binary: 0.01s
proto json:    0.50s
rapidjson:     0.02s

Test 2, 1 top level string field, 1 nested object with 3 more string fields.
proto binary: 0.02s
proto json:    1.06s
rapidjson:     0.05s

Test 3, 2 string fields, and 1 ::google::protobuf::ListValue containing doubles of the format, [[[double, double], [double, double], ...]], 36 pairs of doubles total.
proto binary: 1.50s
proto json:    8.87s
rapidjson:     0.41s
I think this is because of your choice of using google::protobuf::ListValue. That type (along with google::protobuf::Value/Struct) is specifically designed to mimic arbitrary JSON content with proto and is far from efficient compared to protobuf primitive types. I would just use a "repeated double" to represent these 36 pairs of doubles.
 

Protobuf binary serialization code:
    std::string toJSON(Message const& msg) { return msg.SerializeAsString(); }

Protobuf json serialization code:
    std::string toJSON(Message const& msg) { return msg.SerializeAsString(); }
        std::string json;
        ::google::protobuf::util::MessageToJsonString(msg, std::addressof(json));
        return json;
    }

Rapidjson serialization code:
    // It's a lengthy section of code manually populating the document.  Of note, empty strings and numbers set to 0 are omitted from the JSON as the protobuf does.  The resulting JSON is exactly the same as the protobuf json.

Any info on how to improve the protobuf to JSON serialization would be greatly appreciated! 

Thanks,
Ed

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscribe@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
Message has been deleted
0 new messages