Apache Avro - binary format for openrtb?

361 views
Skip to first unread message

Willard Simmons

unread,
May 12, 2011, 5:48:24 PM5/12/11
to openr...@googlegroups.com
Hi  OpenRTB dev,

Robert Foldes just posted a blog post discussing some arguments for using Apache Avro instead of json for OpenRTB real-time bidding formats.  


Check it out.  Happy to get comments back on this.   The reason this came us is that as we are working on the 2.0 spec, we  realize that the message size was getting quite large.  Binary message formats are a tactic for reducing bandwidth usage and CPU processing time.

--bill



--


Willard Simmons

Chief Technology Officer

DataXu

 

281 Summer Street | Boston, MA | 02210
O: 
617.752.1123

http://twitter.com/DataXu

hottbucks

unread,
May 13, 2011, 9:40:21 AM5/13/11
to openrtb-dev
Very nice blog post! and the sample code in github is very simple and
clear.

Avro has brought great benefits to us at Where Inc - we use avro as
intermedia format for big data MapReduce jobs, and it has applaudable
performance boost compared to plain JSON. we also take advantage of
schema, and code generation features. Avro snapshot api can help on
examining the data especially in our debugging process (http://search-
hadoop.com/jd/avro/org/apache/avro/tool/package-summary.html)

Avro data is smaller and faster to process than that of other
serialization systems, here's the benchmark
http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

- QiHui Zhang


On May 12, 5:48 pm, Willard Simmons <b...@dataxu.com> wrote:
> Hi  OpenRTB dev,
>
> Robert Foldes just posted a blog post discussing some arguments for using Apache Avro instead of json for OpenRTB real-time bidding formats.
>
> http://www.dataxu.com/2011/05/three-reasons-why-apache-avro-data-seri...

Patrik Oscarsson

unread,
May 16, 2011, 5:43:32 AM5/16/11
to openr...@googlegroups.com
Hi, we have looked at Avro before starting to use protocol buffer internally. It seems good as a protocol, but we couldn't find a usable/mature .Net implementation. As long as that's the case, it seems like a bad idea to base a cross platform standard around it. Are we the only ones using .Net?

Patrik
Admeta AB

Willard Simmons

unread,
May 16, 2011, 7:03:58 AM5/16/11
to openr...@googlegroups.com
Hi Patrick,

The first version of .NET support is in Avro 1.5.1  

But, even if a language is not supported, avro can be switched to json mode, which provides json messages driven from a schema.

Nothing in openrtb should be considered mandatory.  The group has no intention of telling others how to design your system.  (this would be bad for industry innovation)  However, the more you are in line with the spec, the easier time you will have integrating with peers.  

-bill



--


Willard Simmons

Chief Technology Officer

DataXu

 

281 Summer Street | Boston, MA | 02210
O: 
617.752.1123

http://twitter.com/DataXu

Sam Tingleff

unread,
May 16, 2011, 12:26:37 PM5/16/11
to openrtb-dev

If this were an internal service we would probably use PB or thrift
instead of Avro, but only because we have institutional investments
and knowledge in both. I can see the technical advantages Robert
describes. Any of the three would be a big improvement over json.

On May 13, 6:40 am, hottbucks <hottbu...@gmail.com> wrote:
> Very nice blog post! and the sample code in github is very simple and
> clear.
>
> Avro has brought great benefits to us at Where Inc - we use avro as
> intermedia format for big data MapReduce jobs, and it has applaudable
> performance boost compared to plain JSON. we also take advantage of
> schema, and code generation features. Avro snapshot api can help on
> examining the data especially in our debugging process  (http://search-
> hadoop.com/jd/avro/org/apache/avro/tool/package-summary.html)
>
> Avro data is smaller and faster to process than that of other
> serialization systems, here's the benchmarkhttp://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

Patrik Oscarsson

unread,
May 17, 2011, 6:26:44 AM5/17/11
to openr...@googlegroups.com
The .Net implementation seems to be highly experimental and only exists as a source tarball in the AVRO tree (are there unit tests, what functionality is supported, how good is .Net performance, etc?). 

If the goal of the spec is interoperability, and if the spec is to suggest one binary protocol to use (which would make sense), then I guess it's best to pick a protocol that's widely supported among different platforms.

Patrik

Simon Reavely

unread,
May 18, 2011, 12:03:40 AM5/18/11
to openr...@googlegroups.com
I've been following the .NET implementation (since we have to interop between .NET and Java in places). For example on the AVRO-533 progress, see one of the latest posts: 

I wouldn't call it "experimental" but its very, very green (i.e. new) and its going to be a tough sell for the .NET community compared to thrift and protobuf. 
However, I think that Avro is clearly superior. 

So what to do? Options are clearly: 

1. Use Avro (with folks falling back to JSON if there are issues on their platform)
2. Use something inferior but more cross platform and proven (thrift or protobuf). BTW, I'd go protobuf having seen thrift's .NET support in action.
...its a tough call. Right now we use thrift but I'm keen on Avro (but since i'd be using Java anyway you can ignore my vote from the .NET side). 

Maybe a vote within the community? I think that the question is whether you gamble on Avro...I think its a safe bet given the activity level but others may disagree. 

Cheers,
Simon
--
Simon Reavely
simon....@gmail.com

Jason Shao

unread,
May 18, 2011, 10:57:54 AM5/18/11
to openr...@googlegroups.com

Alternatively – I think our “standard” protocol format at the wire could be JSON. Avro could be an “optional” or “experimental” binding which SSP/DSP partners can choose to implement – though I’d go so far as to suggest we *require* SSP side partners to implement JSON to be able to say they support OpenRTB bidding.

 

My main reasoning is I think we should define things at the wire, not at the lib/implementation level. So – the reference impl could for instance use AVRO to generate JSON and/or AVRO-binary-format (if that’s clean and workable) but we don’t bake AVRO into the spec as a format (separate from implementation) until it’s proven, and ideally some people have production experience with it, and report back that it worked well and they’d like to spread usage for the efficiency benefits over the wire.

 

Jason

Willard Simmons

unread,
May 18, 2011, 11:59:40 AM5/18/11
to openr...@googlegroups.com
Hi,

I agree with Jason here.

JSON is well supported and great for many reasons. So it's a good choice as a primary format, than no one really argues against.  However, it has two drawbacks: 
  1. although smaller than XML, it's still big if you are throwing around a lot of data, since you have to re-transmit the keys for every field, with every message.   (this leads to short cryptic keys and values in practice, which leads to bugs and other issues)
  2. Validation can be hard, since there it's generally not schema based.
Avro supports json mode and binary mode.  It can be used to validate json from a schema, as well as provide a framework for schema evolution of json.    Although the .NET implementation is not yet mature, it's on it's way, and there's no requirement to use it.  In fact, there are no strict compliance rules for openrtb in general.  If you want to implement the same fields in protobuf or thift, go for it.  We imagine picking one binary format of choice is helpful to the group, but other formats also could be supported.  It's like saying you are forced to use Java or C, instead of .NET.  We're not going to do that.

OpenRTB is a community developed guideline, not a strict set of rules.   We may, as a community, provide open source reference code, but there is no obligation to use that code in it's exact form.  It's released with a BSD license (free / open / non-viral ).  

-bill



--


Willard Simmons

Chief Technology Officer

DataXu

 

281 Summer Street | Boston, MA | 02210
O: 
617.752.1123

http://twitter.com/DataXu

Simon Reavely

unread,
May 23, 2011, 2:01:50 AM5/23/11
to openr...@googlegroups.com
Thought I would let everyone know that today I started developing some Avro schemas and junit tests for the OpenRTB Mobile Spec 1.0

I'm not sure if these would be useful to contribute, so let me know, and please if I'm repeating anyone's work let me know too. 

BTW...my avro schemas look like this: 

{
  "type": "record", 
  "namespace": "org.openrtb.mobile",
  "name": "BidRequest", 
  "doc" : "Top-Level BidRequest Object from OpenRTB Mobile 1.0 Spec", 
  "fields": [
      {"name": "id", "type": "string", "required":"true","comment":"Unique ID of the bid request (i.e., the overall auction ID)."},
      {"name": "at", "type": ["int", "null"], "required":"false","comment":"Auction type - 1 indicates 1st Price, others denote alternate rules."},
      {"name": "imp", "type": {"type": "array", "items": {
 "type": "record", 
 "namespace": "org.openrtb.mobile",
 "name": "BidImpression", 
 "doc" : "Bid Impression Object used in Bid Request from OpenRTB Mobile 1.0 Spec", 
 "fields": [
     {"name": "impid", "type": "string","required":"true","comment":"Unique ID of the impression."}
 ]
}      
      }, "required":"true","comment":"1 object per impression being offered for bid"}
  ]
}

So that produces something like this: 
  "id" : "TestReadFromJsonFile", 
  "at":{"int":1},
  "imp":[{"impid":"123"}]
}

BTW...The use of a union for nullable fields is a bit weird to me (see the 'at' field). Maybe I'll get used to it!

Cheers,
Simon
--
Simon Reavely
simon....@gmail.com

Kyle Lussier (http://Tickle.Me)

unread,
Jun 5, 2011, 6:02:31 PM6/5/11
to openrtb-dev
Right now we are adding JSON support. JSON is a pretty good, clean,
clear standard that is used in a lot of places. If the decision is
made to
make AVRO the primary format, we can add an AVRO import/export
layer as needed.

I think either implementation can be made computationally quick.

The key would appear to be how much real bandwidth savings
there is between proper usage of either of these formats and the
tradeoff is slightly more complexity on implementation side (possibly
slower adoption).

Is the bandwidth saved worth (possibly) slowing adoption?

Jamie McCrindle

unread,
Jun 5, 2011, 6:24:54 PM6/5/11
to openr...@googlegroups.com
HI,

We're using protocol buffers internally, which is working quite well for us. There are definitely advantages to a binary format with a standardised way of specifying a schema. That said, I'd prefer if the standard did use JSON and then had a 'supplement' of binary standards that could be used e.g. Avro etc. I believe most of the binary formats can transform to and from JSON without being 'leaky' provided the original specification is JSON only.

regards,
Jamie
XA.net

Willard Simmons

unread,
Jun 5, 2011, 6:36:50 PM6/5/11
to openr...@googlegroups.com
Hi,

Avro will be optional.  We have found that binary formats can decrease bandwidth by 50%, and computational cost somewhat, depending on your implementation.   Avro can be used to generate & parse json based on an AVRO schema, so it's possible for exchanges and bidders to support either mode, as desired.


--


Willard Simmons

Chief Technology Officer

DataXu

 

281 Summer Street | Boston, MA | 02210
O: 
617.752.1123

http://twitter.com/DataXu

From: Jamie McCrindle <ja...@xa.net>
Reply-To: "openr...@googlegroups.com" <openr...@googlegroups.com>
Date: Sun, 5 Jun 2011 17:24:54 -0500
To: "openr...@googlegroups.com" <openr...@googlegroups.com>
Subject: Re: [openrtb-dev] Re: Apache Avro - binary format for openrtb?

Simon Reavely

unread,
Jun 12, 2011, 9:34:17 PM6/12/11
to Jason Shao, Willard Simmons, Chip Pate, Robert Foldes, openr...@googlegroups.com
Hi, 

I updated my sandbox with the complete openrtb mobile avro schemas. 

If possible I'd like to work with you to move them into the google code sandbox. 

...in the meanwhile I'm continuing to work on the unit tests and integration. 

Cheers,
Simon

On Tue, May 31, 2011 at 6:46 PM, Simon Reavely <simon....@gmail.com> wrote:
FYI, in the meanwhile I've created my own sandbox here: 

...its far from complete (schema-wise) but now that I've got some testing infrastructure in place evolving the schemas based on the mobile 1.0 spec should not be hard. Assuming I make sufficient progress on other work deadlines I'm hoping to have the full schemas fleshed out in about a week. 

The schemas themselves are in this folder: 

Cheers,
Simon



--
Simon Reavely
simon....@gmail.com
Reply all
Reply to author
Forward
0 new messages