Push vs. Fetch/Cache/Invalidate

10 views
Skip to first unread message

chl...@yahoo.com

unread,
Sep 30, 2008, 12:53:34 PM9/30/08
to OpenSocial - OpenSocial and Gadgets Specification Discussion
Couple of container groups expressed the need for a push model when
dealing with content. The use case is like this:

- The user installed an OpenSocial app from a 3rd party onto the
container site.
- There is strict SLA level on the container site which requires the
app to be rendered within a given time frame. This requirement leads
to the need that all content used for rendering the view are locally
stored.
- The user performed some activities on a 3rd party site which
results
in some data about this user changed.
- The app's view on the container's site needs to reflect the changes
when it is rendered.


There are several approaches for this problem:


1. Push Model: There are two alternatives in this model:
a. Facebook model: Facebook provides setFBML api to let developer
push personalized app view code to the container. This view code is
stored inside the container and ready to be used when the app is
rendered.
b. Push Content Model: Instead of pushing the complete
personalized
view code, only the data got pushed. View's code is shared cross
users.


2. Fetch/Cache/Invalidate Model: In this model, data are fetched and
cached. TTL are respected in determining the life of the cache
record.
The container is also required to provide an api to allow developers
invalidate certain cache record. On top of this, the container can
decide wheither and when to do a pre-fetch.


Are there any other approach for this problem? Which approach is most
optimal or do we need a hybrid approach? Any other thoughts?


-Charlie

Scott Seely

unread,
Sep 30, 2008, 1:02:51 PM9/30/08
to opensocial-an...@googlegroups.com
An application can store its own data in AppData. It should be possible
to store the AppData via the RESTful API from the 3rd party site or via
the JS API if the application is running in the container. At render
time, the PersonAppDataRequest tag in OSML would be used to put that
data into the gadget at render time. Whether the rendering is 'push' or
'pull' then depends on where the markup is rendered: at the container
server or at the client. The push becomes an optimization in your
implementation of OSML.

See
http://wiki.opensocial-templates.org/index.php?title=OpenSocial_Markup#.
3Cos:PersonAppDataRequest.3E for tag details, and let us know if any
changes are needed to the tag.

Kevin Brown

unread,
Sep 30, 2008, 1:08:37 PM9/30/08
to opensocial-an...@googlegroups.com
The problem with that approach is that it would require that your gadget's template be static, and you couldn't push in a rendered view of the final content.

To Charlie's original response, I believe that the pull / cache / invalidate approach is the easiest to scale for both developers and containers. The cost of persistent storage can be quite high (backups, replication, etc.), whereas caches can be made extremely cheaply.

Louis Ryan

unread,
Sep 30, 2008, 1:34:50 PM9/30/08
to opensocial-an...@googlegroups.com
I think the templating case is orthogonal to the content management requirements.

Not sure about this but I would imagine that you could use os:include (?) to inline template content fetched from a URL which would be managed by this scheme and that os:include could be processed server side?

I also believe that a pull/invalidate/expire model is the best way for containers to manage large quantities of per-user markup defined by third parties because of the costs involved in push. Not just from a containers perspective because of significantly higher storage costs involved but also from the developers standpoint because of the cost to push content to the full install base periodically.

Scott Seely

unread,
Sep 30, 2008, 1:35:51 PM9/30/08
to opensocial-an...@googlegroups.com

Looking at this, it seems that a server based OSML implementation could push via the tag and achieve a faster ‘first render’, which is what the question appeared to be about.

 

You are correct that there is a different level of flexibility by doing everything at the client, but you pay for the extra flexibility by getting to ‘first render’ a bit later due to a few more round trips to the server. The cost of getting the data out depends on a lot of factors, so I can’t agree nor disagree with your point about scale. Scaling at the server vs. client has a lot to do with the architecture in use. For example, ShinDig appears to be built to scale at the client. (I might be misreading the code and architectural docs though…)

Evan Gilbert

unread,
Sep 30, 2008, 1:47:33 PM9/30/08
to opensocial-an...@googlegroups.com
Charlie - thanks for kicking off the discussion on this. We have similar needs for our rendering and I'd like to see an enhancement for the 0.9 spec dealing with this.

I really like option 2 and 1b. Both require minimal spec changes, and are simple from the developer's perspective:
- #2 only requires a new invalidation API on top of the existing proposal for type="proxied".
- #1b is already possible with app data and has the advantage that content can be dynamic based on both viewer and owner (yes, there are a few dynamic things that can be done with cached content based on owner, but it is limited). Per Scott's notes can be quite efficient from the sever's perspective (and no round trips to 3rd party sites) using templates.

I can be convinced that 1a is also useful and worthy of being in the spec, but would prefer to see first if these two simpler changes can meet developer needs.

On Tue, Sep 30, 2008 at 10:08 AM, Kevin Brown <et...@google.com> wrote:

Scott Seely

unread,
Oct 27, 2008, 4:11:25 PM10/27/08
to OpenSocial - OpenSocial and Gadgets Specification Discussion
Do we have an updated proposal?

On Sep 30, 10:47 am, Evan Gilbert <uid...@google.com> wrote:
> Charlie - thanks for kicking off the discussion on this. We have similar
> needs for our rendering and I'd like to see an enhancement for the 0.9 spec
> dealing with this.
>
> I really like option 2 and 1b. Both require minimal spec changes, and are
> simple from the developer's perspective:
> - #2 only requires a new invalidation API on top of the existing proposal
> for type="proxied".
> - #1b is already possible with app data and has the advantage that content
> can be dynamic based on both viewer and owner (yes, there are a few dynamic
> things that can be done with cached content based on owner, but it is
> limited). Per Scott's notes can be quite efficient from the sever's
> perspective (and no round trips to 3rd party sites) using templates.
>
> I can be convinced that 1a is also useful and worthy of being in the spec,
> but would prefer to see first if these two simpler changes can meet
> developer needs.
>

Chris Chabot

unread,
Nov 2, 2008, 8:12:02 AM11/2/08
to opensocial-an...@googlegroups.com
I think from the reactions we can conclude that proposal #2 is seen by everyone as the best solution, to quote:


"2. Fetch/Cache/Invalidate Model: In this model, data are fetched and
cached. TTL are respected in determining the life of the cache
record.
The container is also required to provide an api to allow developers
invalidate certain cache record. On top of this, the container can
decide wheither and when to do a pre-fetch."

So the content is fetched from the dev's server, and cached on the container while respecting TTL's.

Since we already use standard HTTP TTL headers, that seems to be the only consistent way for the dev to define the TT.

Defining the fetch part is part of the data pipelining / proxied content discussion, so I don't think we have to make that part of this discussion.

The Invalidation does need a spec ... Since we already have a REST API that is a MUST in the OpenSocial Specification, I would suggest we add an end-point to it for this purpose.

I guess the one open issue is to come up with what goes into the DELETE action URL, when looking at things from just the gadget point of view, a app id, mod id (and optionally a group/user id or is this action by definition for 'all' ?).

However there is a likely chance something like an invalidation schema could also be useful at some point for other types of apps (think mobile apps that are rest only), which will not have an app- and mod id, so some other schema might be more appropiate there if we feel it should be future-proof for such possible situatons?

   -- Chris

Kevin Brown

unread,
Nov 3, 2008, 2:31:08 AM11/3/08
to opensocial-an...@googlegroups.com
On Sun, Nov 2, 2008 at 6:12 AM, Chris Chabot <cha...@google.com> wrote:
I think from the reactions we can conclude that proposal #2 is seen by everyone as the best solution, to quote:


"2. Fetch/Cache/Invalidate Model: In this model, data are fetched and
cached. TTL are respected in determining the life of the cache
record.
The container is also required to provide an api to allow developers
invalidate certain cache record. On top of this, the container can
decide wheither and when to do a pre-fetch."

So the content is fetched from the dev's server, and cached on the container while respecting TTL's.

Since we already use standard HTTP TTL headers, that seems to be the only consistent way for the dev to define the TT.

Defining the fetch part is part of the data pipelining / proxied content discussion, so I don't think we have to make that part of this discussion.

The Invalidation does need a spec ... Since we already have a REST API that is a MUST in the OpenSocial Specification, I would suggest we add an end-point to it for this purpose.

I guess the one open issue is to come up with what goes into the DELETE action URL, when looking at things from just the gadget point of view, a app id, mod id (and optionally a group/user id or is this action by definition for 'all' ?).

Evan had a good suggestion which is easy to implement and scales pretty well, which is invalidation by app id and user id (neither owner nor viewer). In practice, this can be implemented by storing a timestamp for a given user and using that as part of the cache key when it comes time to invalidate. The flow would go something like this:

string get_key(string data_key, int app_id, int[] user_ids) {
  string key = data;
  foreach (user_ids as user_id) {
    key += get_user_key(app_id, user_id);
  }
  return hash(key);
}

string get_user_key(int app_id, int user_id) {
  int timestamp = get_timestamp_for_user(app_id, user_id);
  if (timestamp == -1) {
    timestamp = create_and_store_timestamp_for_user(app_id, user_id);
  }
  return string(app_id) + string(user_id) + string (timestamp);
}

In a typical shared cache, such as memcached, the timestamps themselves can easily be stored in the same place as the data.

data_key would generally be a url, but it doesn't necessarily have to be.
 

Charlie Jiang

unread,
Nov 3, 2008, 2:11:05 PM11/3/08
to opensocial-an...@googlegroups.com

John@Yahoo has a detailed write up about the invalidation protocol. It will be posted in next couple of days.

 

-Charlie

 


From: opensocial-an...@googlegroups.com [mailto:opensocial-an...@googlegroups.com] On Behalf Of Kevin Brown
Sent: Sunday, November 02, 2008 11:31 PM
To: opensocial-an...@googlegroups.com
Subject: [opensocial-and-gadgets-spec] Re: Push vs. Fetch/Cache/Invalidate

 

On Sun, Nov 2, 2008 at 6:12 AM, Chris Chabot <cha...@google.com> wrote:

John Hayes

unread,
Nov 5, 2008, 1:22:02 PM11/5/08
to opensocial-an...@googlegroups.com
I wanted to post a quick followup.
 
I should be able to get something to the group by Saturday, as on Friday I'm meeting with Mark Nottingham to review the differences between my current implementation and his IETF proposal for cache channels. Anyone interested in caching can review his proposal at:
 

 http://ietfreport.isoc.org/idref/draft-nottingham-http-cache-channels/

 http://www.mnot.net/cache_channels/

Another data point which isn't called a caching technology, but is structurally similar is FriendFeed's simple update protocol:

 
GNIP's API contains invalidate by URL:
 
 
My goals vary a little from these protocols; without writing out a spec, I'll just list them to get some sanity-feedback:
 
* The protocol may be applied to any resource retreived by the container (including gadget.xml) whether initiated directly in the container or by javascript. It will be optional for all resources, but a container may choose to degrade quality of service based on resource availability. If either a container or a developer does not implement invalidation, behavior will fall-back to standard HTTP.
* While URLs may be used to address resources for invalidation, it is assumed that this is inadequate for all cases. Some URLs contain difficult-to-predict machine data and some have header-driven variants that are not addressable. Resources will be addressed by a selection of system-default keys (like URL and opensocial ID) and developer-supplied keys that represent their internal data structures.
* The system should function normally in the face of complete database loss by the container or the developer and have a bounded time to detect this condition.
* No polling-based invalidation schemes.
* The protocol will be equally applicable, though optional, for containers that support cache invalidation on their own web service APIs.
* Developers must be allowed to scope their invalidations to documents the container has actually observed.
 
Comments welcome,
 
John

Louis Ryan

unread,
Nov 5, 2008, 2:44:05 PM11/5/08
to opensocial-an...@googlegroups.com
John

One tension I think that needs to be resolved is the varying degree to which containers are willing to support an arbitrary synthetic keying mechanism by developers. Some containers will need/want to exactly dictate the set of available synthetic keys and would largely ignore the URN in the channel section of the Cache-Control header, other containers may provide a combination of predefined keys as well as the ability to support arbitrary synthetic keys. 

A spec which allows for both types of implementations will cause compatability headaches for developers. The costs of supporting arbitrary synthetic keys are no laughing matter for really big caches and so Im generally in favor of a more strict container dictated keying model.

Perhaps I missed it but I didnt see anything in the spec where the cache could communicate its default key for the content to the original content provider? 

Cheers

-Louis

John Hayes

unread,
Nov 5, 2008, 7:29:56 PM11/5/08
to opensocial-an...@googlegroups.com
I just want to clarify that I am not proposing cache channels as the invalidation mechanism, it is just an illustrative example of a proposed standard in this space. I'm also quite concerned with the overhead that might be introduced with a large and distributed cache as we'll be encountering this in our implementation. Whether keys are developer or container defined, the implementation strategy works out the same. First, some of the assumptions I'm working from:
 
1. When a key is invalidated, the developer is expressing the possibility that every resource in that set has changed and none of the resources outside of the set have changed.
2. A cache must not serve the current version of at least the set of resources identified by a key.
3. Most of the contents of a cache will not be accessed in the future, so as long as false invalidations aren't clustered it's unlikely to have an impact.
 
I apologize for being pedantic, but the important conclusion is that it isn't neccesary to maintain a mapping from a key to a resource. It is neccessary to maintain a mapping from a resource to key, but only an approximation of a key.
 
Lets open up the use case for a system that could only invalidate on the basis of [gadget URL+opensocial id+view] - a reasonable implementation strategy would be:
 
* On receiving a cacheable document via proxy, makeRequest or data pipeline:
  - Calculate the hash of the gadget URL+opensocial ID+view
  - Get or store that hash with a version number
  - Add the hash and the version to the resource headers (like Etag)
 
* On receiving an invalidation notification
  - Store a changed version number with the has of the invalidation key
 
* On validating a resource in cache
  - Lookup the version number of the hash stored with the document if it matches the document is valid.
 
The storage space required for tracking validation state grows linearly with the number of documents (not the document size) and sub-linearly to track key versions.
 
A container would probably have further keys for administrative purposes, as a developer or a container might wish to flush by: url, user, user + url, gadget url, view, view + user, io.makeRequest domain, language, static resources (images or css), app id, developer id, gadget url + user, access token. Data pipelining introduces indirect dependencies: if a gadget server posts the result of a people request to a proxy view, then a change in that people request should also invalidate the proxy view itself.
 
The easiest implementation of indirect dependencies would just roll the keys into a union. Although, I hope this isn't too scary, the higher the quality of invalidation, the longer resources can be kept in cache. There are 5 different types of keys that might be present on a single document:
 
1. For purely internal use (not documented):
 
An OAuth access token is found to be invalid and the user changes it to a valid one. All documents retreived via the old access token should be invalidated because it's unknown whether the new access token refers to the same person.
 
2. For development use (documented, not standard):
 
A developer gets reports that a view isn't functioning for some users, they fix the error and then instruct the container to refetch all of those views.
 
3. Implied from request context (documented, standard):
 
The container always adds the opensocial ID of the owner to resources used in a profile view. When a developer wants to trigger an update, they invalidate the opensocial ID.
 
4. Cascaded dependencies (not documented):
 
One data-pipeline request is passed into another request.
 
5. From the developer:
 
The developer includes keys from their own data structures and doesn't know anything about OpenSocial.
 
I feel like I've rambled on a bit, but I hope this has helped show that developer specified keys isn't incrementally onerous for the container and is inline with a container implementing a complete internal invalidation strategy. I think it'll be a boon to developers as they can report changes according to their own data structure instead of OpenSocial's.
 
An invalidation protocol with inconsistent behavior across containers probably isn't worth having.
 
John

Louis Ryan

unread,
Nov 6, 2008, 3:06:22 AM11/6/08
to opensocial-an...@googlegroups.com
John,

Sorry if I misconstrued. I think you've outlined below something very similar to what Kevin/Evan proposed which I was in broad agreement with anyway :) 

As for the container specified keys vs. developer specified keys question I agree the theoretical computational costs are similar. However I still favor container specified keys as the approach in the initial release for a couple of reasons. 
- It provides structure to developers without them having to design their own 
- It allows containers to enforce quota constraints more uniformly, individual developers could define their own synthetic keys at varying degrees of granularity making monitoring of application behavior more difficult for the container.

If we choose container specified keys it would be appropriate for the request to the developer to contain the key in some concrete form which the developer can then use at a later time to invalidate, Etag doesnt seem like a good fit for this so I was assuming some arbitrarily named header or Cache-Control part. Any suggestions?

One open question inlined below.

On Wed, Nov 5, 2008 at 4:29 PM, John Hayes <john.mar...@gmail.com> wrote:
I just want to clarify that I am not proposing cache channels as the invalidation mechanism, it is just an illustrative example of a proposed standard in this space. I'm also quite concerned with the overhead that might be introduced with a large and distributed cache as we'll be encountering this in our implementation. Whether keys are developer or container defined, the implementation strategy works out the same. First, some of the assumptions I'm working from:
 
1. When a key is invalidated, the developer is expressing the possibility that every resource in that set has changed and none of the resources outside of the set have changed.
2. A cache must not serve the current version of at least the set of resources identified by a key.
3. Most of the contents of a cache will not be accessed in the future, so as long as false invalidations aren't clustered it's unlikely to have an impact.
 
I apologize for being pedantic, but the important conclusion is that it isn't neccesary to maintain a mapping from a key to a resource. It is neccessary to maintain a mapping from a resource to key, but only an approximation of a key.
 
Lets open up the use case for a system that could only invalidate on the basis of [gadget URL+opensocial id+view] - a reasonable implementation strategy would be:
 
* On receiving a cacheable document via proxy, makeRequest or data pipeline:
  - Calculate the hash of the gadget URL+opensocial ID+view
  - Get or store that hash with a version number
  - Add the hash and the version to the resource headers (like Etag)
 
* On receiving an invalidation notification
  - Store a changed version number with the has of the invalidation key
 
* On validating a resource in cache
  - Lookup the version number of the hash stored with the document if it matches the document is valid.
 
The storage space required for tracking validation state grows linearly with the number of documents (not the document size) and sub-linearly to track key versions.
 
A container would probably have further keys for administrative purposes, as a developer or a container might wish to flush by: url, user, user + url, gadget url, view, view + user, io.makeRequest domain, language, static resources (images or css), app id, developer id, gadget url + user, access token. Data pipelining introduces indirect dependencies: if a gadget server posts the result of a people request to a proxy view, then a change in that people request should also invalidate the proxy view itself.

re data-pipelining - Do you mean a change in the (hashed) result of the people request or of the request definition itself? The latter can be handled by the developer themselves, the former can have sensitivity to irrelevant field changes particularly when the requests tend to be over-specified. Containers would likely have a fuzzy definition for 'change' here so they can optimize. Some developers will have a purely time based expiration model and don't care about changes to the content of the data-pipelining request .

John Hayes

unread,
Nov 6, 2008, 11:01:08 AM11/6/08
to opensocial-an...@googlegroups.com
On the datapipelining: This depends on what's written into the data pipelining spec; as an opinion, I think a container that decides to cache a resource which was known to be constructed with another dependent resource, the first resource should be invalidated when either of them are invalidated by any means including standard HTTP cache-control. Tracking this dependency should be an entirely internal process to the container not subject to standardization; as the container is the only point where changes to opensocial database can be reliably observed, and only the container knows the real structure of it's own database.
 
Before commenting on container-assigned invalidation keys, I need to better understand what you mean, as I see two possible interpretations:
 
1. The container has a standard set of keys the are assigned to each request, and the developer is expected to be able to reliably interpret these keys and relate them into their internal database.
2. Each container implementation generates their own keys, and while the developer may interpret them according to container-specific knowledge, to support all containers they may will have to store them as opaque tokens.
 
John

Louis Ryan

unread,
Nov 6, 2008, 1:06:19 PM11/6/08
to opensocial-an...@googlegroups.com
I mean #1

John Hayes

unread,
Nov 6, 2008, 4:20:28 PM11/6/08
to opensocial-an...@googlegroups.com
Ok, so when I say a combination of container defined and developer defined tags - the container would add the set of tags documented with each request mode and then the developer could add whatever additional tags they like.
 
So if a developer does nothing but indicate a resource can be invalidate-cached, they can comeback later an invalidate it with a container-inserted identifier. I think this is useful for an evolving application and as a safety net to prevent developers from making unpurgable resources. Developer specified keys would always be in addition.
There are two things that cause me suspicion:
 
1. Sending the keys to the developer: how do you expect a developer to use this information?
2. Using keys to implement policy: what policies would depend on keys rather than concepts already natively understood by the container like users, gadgets or urls?
 
John
Reply all
Reply to author
Forward
0 new messages