I'd like to suggest a rearrangement of the batch API to separate JS->Container batching (an performance mechanism) from JS->Developer batching (a convenience mechanism).1. All data APIs take a callback function and are guaranteed to never synchronously enter than function. If the data is already available, it must set a timeout to wait for JS to be idle.
Example:function hasViewer( var v ) { ... }var service = opensocial.data.getPeopleservice();{ userid: '@viewer', fields: ['name', 'profileUrl'], keys: ['gifts'] }, hasViewer );
service.getPerson(
2. A user can organize several calls into a batch - it will defer callback until all callbacks have completed and assign the callback results into Object.function hasAllData( var data ) { data.viewer, data.viewerFriends ... }var batch = opensocial.util.makeBatch( hasAllData );{ userid: '@viewer', fields: ['name', 'profileUrl'], keys: ['gifts'] }, batch( 'viewerFriends' ) );
var service = opensocial.data.getPeopleservice();
service.getPerson(
{ userid: '@viewer', fields: ['name', 'profileUrl'], keys: ['gifts'] }, batch( 'viewer' ) );
service.getPerson(Consequences:The decision for which operations should be batched into a single HTTP call is inside the opensocial library, but it doesn't need to track client batches. The client batching can be independent of the request generator (and cajoled as a result).
This syntax is definitely more elegant - it means that the second parameter to calls can always be a function. I have a slight preference for this suggestion, but would love to hear from others.
On Mon, Nov 10, 2008 at 3:09 PM, John Hayes <john.mar...@gmail.com> wrote:
I'd like to suggest a rearrangement of the batch API to separate JS->Container batching (an performance mechanism) from JS->Developer batching (a convenience mechanism).1. All data APIs take a callback function and are guaranteed to never synchronously enter than function. If the data is already available, it must set a timeout to wait for JS to be idle.
The timeout is a latency issue and I think we need the option for synchronous processing, as even setTimeout(0) can can end up waiting for a long time as the browser does other work. Wiith preloading, this data often is already available and so we are introducing latency with the requirement for async.
If it's problematic to support in all call contexts, we can state that we only support synchronous callback while processing initial load handlers
Example:function hasViewer( var v ) { ... }var service = opensocial.data.getPeopleservice();{ userid: '@viewer', fields: ['name', 'profileUrl'], keys: ['gifts'] }, hasViewer );
service.getPerson(
2. A user can organize several calls into a batch - it will defer callback until all callbacks have completed and assign the callback results into Object.function hasAllData( var data ) { data.viewer, data.viewerFriends ... }var batch = opensocial.util.makeBatch( hasAllData );{ userid: '@viewer', fields: ['name', 'profileUrl'], keys: ['gifts'] }, batch( 'viewerFriends' ) );
var service = opensocial.data.getPeopleservice();
service.getPerson(
{ userid: '@viewer', fields: ['name', 'profileUrl'], keys: ['gifts'] }, batch( 'viewer' ) );
service.getPerson(Consequences:The decision for which operations should be batched into a single HTTP call is inside the opensocial library, but it doesn't need to track client batches. The client batching can be independent of the request generator (and cajoled as a result).
It wasn't clear to me why this syntax makes a functional difference.
Seems that
service.getPerson({userid: '@viewer'}, batch('viewer')); and
service.getPerson({userid: '@viewer'}, batch, 'viewer')?
can be implemented almost equivalently.
Is this a convenience so that service implementers don't have to check whether the 2nd param is a batch?
Comments inline:
On Mon, Nov 10, 2008 at 4:58 PM, Evan Gilbert <uid...@google.com> wrote:
This syntax is definitely more elegant - it means that the second parameter to calls can always be a function. I have a slight preference for this suggestion, but would love to hear from others.
On Mon, Nov 10, 2008 at 3:09 PM, John Hayes <john.mar...@gmail.com> wrote:
I'd like to suggest a rearrangement of the batch API to separate JS->Container batching (an performance mechanism) from JS->Developer batching (a convenience mechanism).1. All data APIs take a callback function and are guaranteed to never synchronously enter than function. If the data is already available, it must set a timeout to wait for JS to be idle.
The timeout is a latency issue and I think we need the option for synchronous processing, as even setTimeout(0) can can end up waiting for a long time as the browser does other work. Wiith preloading, this data often is already available and so we are introducing latency with the requirement for async.
If it's problematic to support in all call contexts, we can state that we only support synchronous callback while processing initial load handlersI think this is an important issue for behavioral consistency. I've found with other APIs that spurious synchronous callbacks made the code generally less stable (since developers often didn't test one of the cases) and more complex to unit test since the number of combinations of callbacks can be quite large.
Comments inline:
On Mon, Nov 10, 2008 at 4:58 PM, Evan Gilbert <uid...@google.com> wrote:
This syntax is definitely more elegant - it means that the second parameter to calls can always be a function. I have a slight preference for this suggestion, but would love to hear from others.
On Mon, Nov 10, 2008 at 3:09 PM, John Hayes <john.mar...@gmail.com> wrote:
I'd like to suggest a rearrangement of the batch API to separate JS->Container batching (an performance mechanism) from JS->Developer batching (a convenience mechanism).1. All data APIs take a callback function and are guaranteed to never synchronously enter than function. If the data is already available, it must set a timeout to wait for JS to be idle.
The timeout is a latency issue and I think we need the option for synchronous processing, as even setTimeout(0) can can end up waiting for a long time as the browser does other work. Wiith preloading, this data often is already available and so we are introducing latency with the requirement for async.
If it's problematic to support in all call contexts, we can state that we only support synchronous callback while processing initial load handlersI think this is an important issue for behavioral consistency. I've found with other APIs that spurious synchronous callbacks made the code generally less stable (since developers often didn't test one of the cases) and more complex to unit test since the number of combinations of callbacks can be quite large.If you're concerned about making it faster without a timer, I'd recommend generating code at strategic points. After the a gadget's source, make a call to immediately complete events. Caja transformed code provides a flushing opportunity at each event handler. All of these are implementation specific and probably shouldn't be mentioned in the spec expect as implementation guidance.
Example:function hasViewer( var v ) { ... }var service = opensocial.data.getPeopleservice();{ userid: '@viewer', fields: ['name', 'profileUrl'], keys: ['gifts'] }, hasViewer );
service.getPerson(
2. A user can organize several calls into a batch - it will defer callback until all callbacks have completed and assign the callback results into Object.function hasAllData( var data ) { data.viewer, data.viewerFriends ... }var batch = opensocial.util.makeBatch( hasAllData );{ userid: '@viewer', fields: ['name', 'profileUrl'], keys: ['gifts'] }, batch( 'viewerFriends' ) );
var service = opensocial.data.getPeopleservice();
service.getPerson(
{ userid: '@viewer', fields: ['name', 'profileUrl'], keys: ['gifts'] }, batch( 'viewer' ) );
service.getPerson(Consequences:The decision for which operations should be batched into a single HTTP call is inside the opensocial library, but it doesn't need to track client batches. The client batching can be independent of the request generator (and cajoled as a result).
It wasn't clear to me why this syntax makes a functional difference.
Seems that
service.getPerson({userid: '@viewer'}, batch('viewer')); and
service.getPerson({userid: '@viewer'}, batch, 'viewer')?
can be implemented almost equivalently.
Is this a convenience so that service implementers don't have to check whether the 2nd param is a batch?The parity I'm seeking is between:service.getPerson({ .. }, gotMyViewer); andservice.getPerson({ ... }, batch( 'viewerInBatch' ) );
The service provide is not required to know whether a client-batch is being used, how that batch is assembled or when the downstream callback may occur. To make it more concrete, Here's a sample implementation of gadget.util.makeBatch:gadget.util.makeBatch = function( var callback ) {var params = {};var count = 0;return function( var paramName ) {count = count + 1;return function( var result ) {params[ paraName ] = result;count = count - 1;if( count == 0 )callback( params );}
}Inside the service is always just calls callback( result ).
I like where this proposal is heading. A couple of points...- While having the ability to either batch or not batch is potentially more elegant the mere fact that batching is forced on developers today encourages them to think about their data fetching in a latency sensitive way. Without HTTP batching the latency of many applications on Orkut would be intolerable in regions where high network latency is common. Making it conceptually optional is increasing the likelihood that some developers will batch and some wont. Id prefer to always use a batching interface and allow containers to implement how they see fit underneath.
- I think we can eliminate the distinction between getPerson and getPeople
- If these apis were to support something like updating a person entry what would the parameter list look like. Theres a big difference between the opensocial.Person interface and the protocol level JSON encoding of person
- Callbacks must be async even if we have to eat the latency overhead of setTimeout(0) on occasion. This has already bitten us.
- Do we want to go the last step and drop opensocial.Person, opensocial.Activity ... and just expose the JSON encoded protocol types. I really don't know how much these strong types are helping anyone. Eliminating them would help containers focus on making their protocol implementations more compatible. This is ultimately what I would like to see be the direction of opensocial in-browser. High quality easily consumable protocols and a thin JS API so developers don't have to do per-container setup to establish connections.
How about something likevar batch = opensocial.newBatch();batch.people.get({ userid: '@viewer', fields: ['name', 'profileUrl'], keys: ['gifts'] }, "viewer");batch.people.get({ userid: '@viewer', groupId: @friends, fields: ['name', 'profileUrl'], keys: ['gifts'] }, "viewerFriends");batch.http.get("http://www.example.org/somefeed", "myFeed");batch.execute(function callback(data) {data.viewer...data.viewerFriends...data.myFeed....});- batch is explicit and required- batch.<type> where <type> matches the type names defined in the rest spec [people, activities, data, messages]- batch.<type>.<operation> where operation matches get|put/update|post/create|delete (included crud synonyms for REST)- I threw in an 'http' service to simulate makeRequest calls- Developers can introspect the set of available services via if (batch.http) {} etc..
On Tue, Nov 11, 2008 at 12:09 AM, Louis Ryan <lr...@google.com> wrote:I like where this proposal is heading. A couple of points...- While having the ability to either batch or not batch is potentially more elegant the mere fact that batching is forced on developers today encourages them to think about their data fetching in a latency sensitive way. Without HTTP batching the latency of many applications on Orkut would be intolerable in regions where high network latency is common. Making it conceptually optional is increasing the likelihood that some developers will batch and some wont. Id prefer to always use a batching interface and allow containers to implement how they see fit underneath.
I'm in favor of the calls that look like individual functions, for a few reasons:
1. It's a nicer API for developers if we can swing it.
2. They work much more cleanly for the update use cases.
3. Nothing is preventing us from batching underneath this API. We can keep a list of pending requests, and call setTimeout(0, flushPendingRequestsInBatch). Note that you may need to do this even with explicit batching, as you would want to combine requests from multiple gadgets.
4. It's not clear that the right strategy will always be to actually make an HTTP batch. For example, you might want to make 2 requests - one to the home server for social calls, and the other to arbitrary sites, so that you can take advantage multiple threads in the browser.
- I think we can eliminate the distinction between getPerson and getPeople
I've never liked calls that return an array for when you ask for one item by ID, especially for the common use cases, but could be swayed either way on this one.
Also, think that the obejct that does batching should be different from the object(s) that create requests. Combining these all makes it harder to override specific services or the batching mechanism.
Per notes above, would prefer to keep the callback option. Is there a reason why we couldn't execute the following in a batch?
Instead of calling the mechanism ‘batch’ we could also call it ‘queue’. I think the proposal is already veering towards queue semantics. This would offer containers the ability to send one request or many, depending on their processing model and how they want to service requests.
FWIW, MySpace has stayed away from the batch processing mechanisms because we believe that the Data Pipelining/OSML mechanisms offer superior performance capabilities for what I’ll call the 56K modem use cases.
As interesting as the batching/queuing discussion is, I’d prefer to not worry about this part.
With the current set of proposals, we can approximate the feature and get all the benefits for low bandwidth. DataPipelining and named views (another pair of proposals) make this possible. In a world where
requestNavigateTo("Canvas.PersonalItemsView")
can be executed, with Data Pipelining tags available, do we need to define another batching/queuing mechanism?
I like everything about this proposal except for the batching piece. Is it possible to drop the ideas around batching until v.Next?
FWIW, I’m not saying that batching is never going to be helpful. I’d just prefer to get some experience with using Data Pipelining and the extended view name item we are pushing in 0.9. If, in v.Next, batching looks valuable, let’s add it in then. Right now, batching is derailing the conversation and all the good we could create with a simplified API.
The lightweight API doesn’t ‘exist’ yet—there is nothing to remove.
If we must incorporate batching into the spec, it needs to be in a separate, optional section?
The lightweight API doesn't 'exist' yet—there is nothing to remove.