Batching is in the air...

John Panzer

unread,

Apr 7, 2008, 9:06:29 PM4/7/08

to opensocial-an...@googlegroups.com

http://blogs.msdn.com/astoriateam/archive/2008/04/06/batching-data-service-requests.aspx

(Though they have a lot of complicated ChangeSetty- thingies added in to the mix, which I don't think OpenSocial needs.)

Kevin Brown

unread,

Apr 7, 2008, 10:32:11 PM4/7/08

to opensocial-an...@googlegroups.com

The interesting thing here isn't batching in and of itself (although moving to using standard HTTP based batching instead of the ad hoc mechanisms most web services employ is), it's the auto expansion. Something like this might make sense in OpenSocial, when trying to retrieve complex friend graphs.

On Mon, Apr 7, 2008 at 6:06 PM, John Panzer <jpa...@google.com> wrote:

http://blogs.msdn.com/astoriateam/archive/2008/04/06/batching-data-service-requests.aspx

(Though they have a lot of complicated ChangeSetty- thingies added in to the mix, which I don't think OpenSocial needs.)

--
~Kevin

John Panzer

unread,

Apr 7, 2008, 11:03:36 PM4/7/08

to opensocial-an...@googlegroups.com

Yep, definitely. If it's simple enough to implement, URI Templates would be a nice, extensible, possibly standards-based thing to use for this.

Subbu Allamaraju

unread,

Apr 8, 2008, 1:26:26 PM4/8/08

to OpenSocial and Gadgets Specification Discussion

Hi John,

Between the one outlined in your proposal, and the one used in the one
below, the latter has some merits as it relies on application/http,
which in theory, has well-defined semantics. I say "in theory" because
I am not aware of any proxies or caches that can understand this
content-type.

Any reason to use X-Batch-Operation over application/http?

I do have some concerns about generic batching, and hence asked
earlier to see if there are strong use cases driving this desire. My
concerns include:

a. Batch does not necessarily speed up things for the client. The more
generic the batch protocol is, the less likely it is to parallelize
request processing.

b. Batch requests will miss caches, unless some extra work is done by
all caches and intermediaries. This is less likely to happen as this
specification is outside the realm of RFC 2616.

c. More importantly, programming batch requests is not necessarily
simpler. Here is why.
i. We will need special client libraries (say in Java, PHP,
JavaScript etc) to take an arbitrary number of requests and envelope
them into a batch
ii. Error handling would be more complex. Imagine 1 POST, 2 PUTs
and 3 DELETEs failing in a batch of 100 requests.

Please do not get me wrong. Batching may be important, but at this
stage, the additional complexity may outweigh the benefits.

An alternative approach worth considering is to identify specific
batch use cases (such as "update my address book", or "get my profile
and all my contacts's profiles"), and provide specific URIs and
representations to those.

Subbu

On Apr 7, 6:06 pm, "John Panzer" <jpan...@google.com> wrote:
> http://blogs.msdn.com/astoriateam/archive/2008/04/06/batching-data-se...

Kevin Marks

unread,

Apr 8, 2008, 4:10:20 PM4/8/08

to opensocial-an...@googlegroups.com

This follows other batch models I've seen in concatenating standalone single-entry atom feeds, which seems very strange to me. The de-facto way of batching where feeds is concerned is to treat the feed as a collection of entries to be manipulated en masse.

Why not allow simple batching (with a single operation) by allowing a Feed (or feed URL) to be passed in instead of a single Entry to the operation?

On Mon, Apr 7, 2008 at 7:32 PM, Kevin Brown <et...@google.com> wrote:

John Panzer

unread,

Apr 8, 2008, 6:19:47 PM4/8/08

to opensocial-an...@googlegroups.com

I plan to update the proposal to match the application/http semantics (it seems cleaner). My only real question is whether to use BATCH, POST, or allow either. Thoughts?

Specific responses below.

On Tue, Apr 8, 2008 at 10:26 AM, Subbu Allamaraju <subbu.al...@gmail.com> wrote:

Hi John,

Between the one outlined in your proposal, and the one used in the one
below, the latter has some merits as it relies on application/http,
which in theory, has well-defined semantics. I say "in theory" because
I am not aware of any proxies or caches that can understand this
content-type.

Any reason to use X-Batch-Operation over application/http?

I do have some concerns about generic batching, and hence asked
earlier to see if there are strong use cases driving this desire. My
concerns include:

a. Batch does not necessarily speed up things for the client. The more
generic the batch protocol is, the less likely it is to parallelize
request processing.

Absolutely. So a protocol that makes it trivial to batch or not batch, and experiment to determine what works best, helps with optimization IMHO.

b. Batch requests will miss caches, unless some extra work is done by
all caches and intermediaries. This is less likely to happen as this
specification is outside the realm of RFC 2616.

Yep. Much of the legitimate uses are for things that would likely cause cache misses anyway. Note that caching can still happen at the client and at the origin server as they undestand the semantics and can use sub-request caching headers, but intermediaries would need to be modified to do so. I think this is a theoretical rather than a practical issue for the intended usage, though.

c. More importantly, programming batch requests is not necessarily
simpler. Here is why.
i. We will need special client libraries (say in Java, PHP,
JavaScript etc) to take an arbitrary number of requests and envelope
them into a batch

To be clear, I consider batching more complex. It's a (potential) latency optimization, and optimizations only rarely simplify code.

ii. Error handling would be more complex. Imagine 1 POST, 2 PUTs
and 3 DELETEs failing in a batch of 100 requests.

In a callback-based environment, the library can take care of this (once) and shield the client from dealing with it. This would be the the case for the OpenSocial Javascript APIs for example.

Please do not get me wrong. Batching may be important, but at this
stage, the additional complexity may outweigh the benefits.

An alternative approach worth considering is to identify specific
batch use cases (such as "update my address book", or "get my profile
and all my contacts's profiles"), and provide specific URIs and
representations to those.

The killer use case mentioned in the doc is "Add this friend and then get the 3rd page from the top of my friend list". One can imagine more of these types of requests. If you have a specific counter-proposal for the specific use case, that would be a good starting point.

Subbu Allamaraju

unread,

Apr 10, 2008, 4:02:25 PM4/10/08

to opensocial-an...@googlegroups.com

On Apr 8, 2008, at 3:19 PM, John Panzer wrote:
> I plan to update the proposal to match the application/http
> semantics (it seems cleaner). My only real question is whether to
> use BATCH, POST, or allow either. Thoughts?

The spec may have better luck with adoption if POST is used. Lot more
needs to defined for a new verb like BATCH, which should happen under
IETF and not here.

> a. Batch does not necessarily speed up things for the client. The more
> generic the batch protocol is, the less likely it is to parallelize
> request processing.
>
> Absolutely. So a protocol that makes it trivial to batch or not
> batch, and experiment to determine what works best, helps with
> optimization IMHO.

Sorry - I don't follow.

> b. Batch requests will miss caches, unless some extra work is done by
> all caches and intermediaries. This is less likely to happen as this
> specification is outside the realm of RFC 2616.
>
> Yep. Much of the legitimate uses are for things that would likely
> cause cache misses anyway. Note that caching can still happen at
> the client and at the origin server as they undestand the semantics
> and can use sub-request caching headers, but intermediaries would
> need to be modified to do so. I think this is a theoretical rather
> than a practical issue for the intended usage, though.

Clients like browsers can't cache the responses either. The request is
potentially unsafe, and the URI and response headers won't have enough
information to evaluate cacheability.

>
> In a callback-based environment, the library can take care of this
> (once) and shield the client from dealing with it. This would be
> the the case for the OpenSocial Javascript APIs for example.

True.

Subbu

John Panzer

unread,

Apr 10, 2008, 4:26:19 PM4/10/08

to opensocial-an...@googlegroups.com

On Thu, Apr 10, 2008 at 1:02 PM, Subbu Allamaraju <su...@subbu.org> wrote:

On Apr 8, 2008, at 3:19 PM, John Panzer wrote:

>...

> a. Batch does not necessarily speed up things for the client. The more
> generic the batch protocol is, the less likely it is to parallelize
> request processing.
>
> Absolutely. So a protocol that makes it trivial to batch or not
> batch, and experiment to determine what works best, helps with
> optimization IMHO.

Sorry - I don't follow.

If you want parallelization, use separate HTTP requests in parallel. Alternatively, if you want to combine requests and do them serially but avoid multiple round trips, use batching. It's impossible to tell what will be best in the abstract. Clients (specific clients with specific data needs) should run it both ways and measure; then choose the one that works the best. In this environment, being able to _easily_ switch back and forth is important, because an optimization that's too much work to try won't ever get implemented.

> b. Batch requests will miss caches, unless some extra work is done by
> all caches and intermediaries. This is less likely to happen as this
> specification is outside the realm of RFC 2616.
>
> Yep. Much of the legitimate uses are for things that would likely
> cause cache misses anyway. Note that caching can still happen at
> the client and at the origin server as they undestand the semantics
> and can use sub-request caching headers, but intermediaries would
> need to be modified to do so. I think this is a theoretical rather
> than a practical issue for the intended usage, though.

Clients like browsers can't cache the responses either. The request is
potentially unsafe, and the URI and response headers won't have enough
information to evaluate cacheability.

In this case, the client is that AJAX code executing inside the browser, and it can cache the results (at least within the session, and in the presence of things like Gears, across sessions as well.)

John

Kevin Brown

unread,

Apr 10, 2008, 7:08:33 PM4/10/08

to opensocial-an...@googlegroups.com

On Thu, Apr 10, 2008 at 1:02 PM, Subbu Allamaraju <su...@subbu.org> wrote:

On Apr 8, 2008, at 3:19 PM, John Panzer wrote:
> I plan to update the proposal to match the application/http
> semantics (it seems cleaner). My only real question is whether to
> use BATCH, POST, or allow either. Thoughts?

The spec may have better luck with adoption if POST is used. Lot more
needs to defined for a new verb like BATCH, which should happen under
IETF and not here.

I agree -- a change that requires a new http verb seems both unnecessary and unrealistic.

> a. Batch does not necessarily speed up things for the client. The more
> generic the batch protocol is, the less likely it is to parallelize
> request processing.
>
> Absolutely. So a protocol that makes it trivial to batch or not
> batch, and experiment to determine what works best, helps with
> optimization IMHO.

Sorry - I don't follow.

> b. Batch requests will miss caches, unless some extra work is done by
> all caches and intermediaries. This is less likely to happen as this
> specification is outside the realm of RFC 2616.
>
> Yep. Much of the legitimate uses are for things that would likely
> cause cache misses anyway. Note that caching can still happen at
> the client and at the origin server as they undestand the semantics
> and can use sub-request caching headers, but intermediaries would
> need to be modified to do so. I think this is a theoretical rather
> than a practical issue for the intended usage, though.

Clients like browsers can't cache the responses either. The request is
potentially unsafe, and the URI and response headers won't have enough
information to evaluate cacheability.

>
> In a callback-based environment, the library can take care of this
> (once) and shield the client from dealing with it. This would be
> the the case for the OpenSocial Javascript APIs for example.

True.

Subbu

--
~Kevin

Reply all

Reply to author

Forward