Using map with Any

8,372 views
Skip to first unread message

Ranadheer Pulluru

unread,
Apr 1, 2016, 7:36:11 PM4/1/16
to Protocol Buffers
Hi,

I'm planning to use protobuf for publishing tick data of the financial instruments. The consumers can be any of the java/python/node.js languages.The tick is expected to contain various fields like (symbol, ask_price, bid_price, trade_price, trade_time, trade_size, etc). Basically, it is sort of a map from field name to the value, where value type can be any of the primitive types. I thought I can define the schema of the Tick data structure, using map and Any as follows


syntax = "proto3";

package tutorial;

import "google/protobuf/any.proto";

message
Tick {
   
string subject = 1; // name of the financial instrument - something like MSFT, GOOG, etc
    uint64 timestamp
= 2; //millis from epoch signifying the timestamp at which the object is constructed at the publisher side.
    map
<string, google.protobuf.Any> fvmap = 3; // the actual map having field name and values. Something like {ask_price: 10.5, bid_price: 9.5, trade_price: 10, trade_size=5}
}

Though I'm able to generate the code in different languages for this schema, I'm not sure how to populate the values in the fvmap.

public class TickTest
{
    public static void main(String[] args)
    {
        Tick.Builder tick = Tick.newBuilder();
        tick.setSubject("ucas");
        tick.setTimestamp(System.currentTimeMillis());
        Map<String, Any> fvMap = tick.getMutableFvmap();
//        fvMap.put("ask", value); // Not sure how to pass values like 10.5/9.5/10/5 to Any object here.
    }
}



Could you please let me know how to populate the fvMap with different fields and values here? Please feel tree to tell me if using map and Any is not the right choice and if there are any better alternatives.

Thanks
Ranadheer

Feng Xiao

unread,
Apr 1, 2016, 8:50:09 PM4/1/16
to Ranadheer Pulluru, Protocol Buffers
It seems to me a google.protobuf.Struct suites your purpose better:
 

Thanks
Ranadheer

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Kevin Baker

unread,
Apr 2, 2016, 1:17:47 PM4/2/16
to Protocol Buffers, prana...@gmail.com
Hi Ranadheer,

Just a piece of advice, but you may want to try to keep your types as strict as possible... i.e. instead of using a general map to store the parameters, use an additional type to keep everything easy to represent. Something like:

message Tick {
   
string subject = 1; // name of the financial instrument - something like MSFT, GOOG, etc
    uint64 timestamp
= 2; //millis from epoch signifying the timestamp at which the object is constructed at the publisher side.
    TickData tick_data = 3;
}


message TickData {
    float ask_price = 1;
    float bid_price = 2;
    float trade_price = 3;
    uint32 trade_size = 4;
}


or even better, just add all the components into one message:


message Tick {
   
string subject = 1; // name of the financial instrument - something like MSFT, GOOG, etc
    uint64 timestamp
= 2; //millis from epoch signifying the timestamp at which the object is constructed at the publisher side.

    float ask_price = 3;
    float bid_price = 4;
    float trade_price = 5;
    uint64 trade_size = 6;

...
}

... adding any other possible fields you might have in your data. As well as being a lot more compact on-the-wire for bandwidth and CPU improvements, this forces you to think about your data and what might be in it, which will result in a lot less bugs down the road for you and your client consumers. You won't have to worry about typos like accidentally typing 'bid_prce' or 'bidPrice' or 'bidprice' in the Map.

Protobuf will not send any fields that are different from their default values, so you don't pay any performance penalty for having a lot of optional data in the Tick message. You can also still add fields later to the message, while keeping backwards compatibility with old consumers.

Also, another little pedantic thing, but if you are using the timestamp like Javascript's Date.now(), always name it timestamp_utc instead of just timestamp... eventually someone will stuff a local time in there and confuse everyone... better to be explicit.

Kevin

Ranadheer Pulluru

unread,
Apr 3, 2016, 7:54:54 AM4/3/16
to Protocol Buffers, prana...@gmail.com
Hi Kevin,

Thanks for the suggestion. I indeed considered this option of including all the possible fields in one message. But since some of the financial instruments, like convertible bonds/options, have 600-700 fields on the server side, I felt like instantiating such a big object for every update seems bit costly (though on the wire it is efficient ,as you mentioned, as only the fields which are set will be present). Using object pool we can probably avoid the object instantiating cost though. Also, once the object is deserialized on the receiving side, I feel like it is going to be little tricky to figure which of the fields are actually set on the sender side. My understanding is that once the object is deserialized, all fields which are not set on sender side will have default values and we need to iterate over all the fields and compare them against default values to know which are actually set by the sender in the message. So, overall i felt like for classes having many fields this approach is bit inefficient. We can probably group set of fields and use a separate message for each group (like QuoteUpdate, PositionUpdate, etc) but that requires lot of changes in my current code base and theoretically we might end up too many groups. 

Also, fields like positions, market value, etc need to be published for each portfolio where the portfolios can be dynamic. So, having support for Map like object seems to solve both the problems. Having said this, I'm open to the suggestions because the Map approach does have its limitation, as the field names can be free form and can cause issues later.

Point duly noted about the timestamp_utc. I changed my schema accordingly.

Thanks
Ranadheer

Ranadheer Pulluru

unread,
Apr 3, 2016, 8:00:17 AM4/3/16
to Protocol Buffers, prana...@gmail.com
Hi Xiao,

Played with Struct a bit and seems like a perfect fit for what I was looking for. Thanks a lot for the suggestion. Having some issues while trying to figure out the data type of the value on the python side but most likely that could be because of python runtime version mismatch issue, which I'm discussing in the other mail chain.

Thanks again, and will post the performance numbers in terms of serializing/deserializing in different languages (at least, in python, C++, java and node.js.

-Ranadheer

Kevin Baker

unread,
Apr 3, 2016, 2:16:11 PM4/3/16
to Protocol Buffers, prana...@gmail.com
HI Ranadheer,

Sounds like you already had a good handle on the different tradeoffs then! I personally prefer grouping if there are that many fields ... you would still have to look through each possible field to see if it is set in the Map.

Also, using a Map doesn't necessarily mean there are lower object instantiation costs, since depending on the implementation the keys & values of the Map may have to be allocated somewhere as well... Especially with performance stuff, there is no substitute for prototyping and timing example use cases. Looking forward to seeing some of the performance numbers!

Kevin
Reply all
Reply to author
Forward
0 new messages