Additional data types

2,960 views
Skip to first unread message

jhakim

unread,
Apr 4, 2012, 3:37:31 PM4/4/12
to Protocol Buffers
Any plans to provide out-of-the-box for commonly used data types such
as Date (encoded as String) and BigDecimal/BigInteger types? Seems
this would be of interest to a lot of users.

Alexandru Turc

unread,
Apr 4, 2012, 5:09:18 PM4/4/12
to jhakim, Protocol Buffers

proto files are mapped to many languages, Date and BigDecimal are java specific.

> --
> You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
> To post to this group, send email to prot...@googlegroups.com.
> To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
>

Alexandru Turc

unread,
Apr 4, 2012, 5:42:00 PM4/4/12
to Jawaid Hakim, Protocol Buffers

What would be the mapping for C or C++? I think there is an advantage of keeping the set of data types very limited, to primitive values which can be easily mapped to multiple languages.

Otherwise things can get complicated. Let's take date as an example. It's representation depends on the calendar used. Gregorian is very common, but other systems are used as well (Japanese, Buddhist). Also, when your are saying Date are you implying to have just day accuracy or get to hour, minute, second, millisecond etc.. WIth all these a string representation can get quite long which is against protocol buffer's goal to keep the serialized size small.

However, most likely in particular applications based on protocol buffers most likely you do not need all the flexibility and you can have some conventions: only gregorian calendar is used and we only need day accuracy. With these probably you can represent a date in a simple int32 as a value relative to an absolute date. Is more compact and much faster to process. If you are only using java then you can stick with the Java's convention of representing time as number of milliseconds since January 1'st, 1970 UTC and use a int64 for this - reduces the chance of making mistakes.

On Apr 4, 2012, at 11:21 AM, Jawaid Hakim wrote:

> Date and decimal types are ubiquitous and in wide use. Language specific bindings could easily be created - this is exactly what we do in some other open source projects that I contribute to. The way I envision it, protocol buffers would provide 'date' and 'decimal' types - protoc compiler would compile these into language specific data types (e.g. java.util.Date for Java and DateTime for C#).
>
> Jawaid Hakim
> Chief Technology Officer
> CodeStreet LLC
> 646 442 2804
> www.codestreet.com

Jawaid Hakim

unread,
Apr 4, 2012, 5:54:04 PM4/4/12
to Alexandru Turc, Protocol Buffers
You are correct to point out the complications of dealing with complex data types. All the more reason why it would be great to not have the developer community keep re-inventing the wheel. But I understand why this is not on the radar map of the proto buffer team.

My group builds applications using use multiple languages, including Java and C#, so a simple int64 for date representation does not work.

Jawaid Hakim

unread,
Apr 4, 2012, 5:21:46 PM4/4/12
to Alexandru Turc, Protocol Buffers
Date and decimal types are ubiquitous and in wide use. Language specific bindings could easily be created - this is exactly what we do in some other open source projects that I contribute to. The way I envision it, protocol buffers would provide 'date' and 'decimal' types - protoc compiler would compile these into language specific data types (e.g. java.util.Date for Java and DateTime for C#).

Jawaid Hakim
Chief Technology Officer
CodeStreet LLC
646 442 2804
www.codestreet.com


-----Original Message-----
From: Alexandru Turc [mailto:alex...@gmail.com]
Sent: Wednesday, April 04, 2012 5:09 PM
To: Jawaid Hakim
Cc: Protocol Buffers
Subject: Re: [protobuf] Additional data types

Christopher Smith

unread,
Apr 5, 2012, 12:53:08 AM4/5/12
to jhakim, Protocol Buffers
AFAIK the answer is no. A lot of the value of protocol buffers derives from keeping their functionality simple. There are plenty of all singing/all dancing serialization frameworks already. ;-)

I think date in particular is fraught with peril. I'd recommend against encodung them as strings. What I've done is encode all date/time date as int64's, with the value being milliseconds since the UTC epoch. Even that has complications, but it is a "good enough" approach.

In theory, BigInteger could be encoded using the existing varint encoding, so you could write a module fairly easily, and of course once you can do that and encode floats BigDecimal is straightforward. Alternatively you could store the raw bytes of the BigDecimal in a raw field.

To make BigInteger a part of the standard protocol buffer definition, there's a lot more work involved, and a price to be paid. The challenge is having a consistent, tested, efficient mechanism for handling this in the plethora of languages that protocol buffers support. Without that, you undermine the ability of protocol buffer's to always be parsed consistently everywhere, which is a very important feature. This is a big undertaking, particularly given that some languages don't have a standard type equivalent. Given that it is a data type so much less often needed, You can see why it likely doesn't make a lot of sense to put it in the standard implementation/language.

--Chris

Christopher Smith

unread,
Apr 5, 2012, 12:57:34 AM4/5/12
to Jawaid Hakim, Alexandru Turc, Protocol Buffers
Nothing prevents you from making a module available for everyone's benefit. If it is broadly useful, it will undoubtedly be universally adopted.

--Chris

P.S.: What is a "decimal type"?

Christopher Smith

unread,
Apr 5, 2012, 1:35:48 AM4/5/12
to Jawaid Hakim, Alexandru Turc, Protocol Buffers
On Apr 4, 2012, at 2:54 PM, Jawaid Hakim <Jawaid...@codestreet.com> wrote:

> My group builds applications using use multiple languages, including Java and C#, so a simple int64 for date representation does not work.

That there isn't a simple way to do it is a pretty nasty strike against having a standard implementation.

I'm surprised though that int64 wouldn't suffice. Any language that supports more than a couple popular OS platforms is going to need have some logic somewhere for moving back and forth between whatever its preferred date/time objects and something that looks an awful lot like an int64, and usually it's easily available.

So far I've done this with C++ (using boost's date time objects), Java, C#, Python, JavaScript, and I think Perl once too; it hasn't needed more than a few lines of code for any of them (which is really saying something in the case of Java). Have I unwittingly made a bug, or do your complications come from a different scope?

Jawaid Hakim

unread,
Apr 5, 2012, 8:05:18 AM4/5/12
to Christopher Smith, Alexandru Turc, Protocol Buffers
C# has a decimal type and Java has BigDecimal - 'decimal' seems like a generic data type name for Protocol Buffers.

I hear you about making contributing a module; will see if that is possible.

Igor Gatis

unread,
Feb 24, 2014, 7:55:50 PM2/24/14
to prot...@googlegroups.com, Christopher Smith, Alexandru Turc, jawaid...@codestreet.com
If you're using protobuf-csharp-port, you may want to try out this patch which generates protobuf classes with native Decimal and DateTime structs. Wire format wise, they are represented by two "built-in" proto messages.

Here is the link:
Reply all
Reply to author
Forward
0 new messages