Re-proposing the content rewriter feature for 0.9 that did'nt quite make it for 0.8.1.
Original thread is here: https://groups.google.com/group/opensocial-and-gadgets-spec/browse_thread/thread/7a9bd86e74ab803e/5448bfb5084c373e?lnk=gst&q=rewriter#5448bfb5084c373e
Changes from the original thread
- removal of include-tags as its essentially redundant with the ability to match by links in tags anyway and was causing confusion in practice as people were trying to use regexs here.
-------------------------------------------
Proposal
Add support for rewriting the content of a generated gadget and allow developers to control how the behavior of the rewriter by introducing a new standard optional gadget feature content-rewrite.
E.g.
<Optional feature="content-rewrite">
<Param name="expires">86400</Param><Param name="include-urls">.*</Param><Param name="exclude-urls"></Param>
<Param name="minify-css">true</Param>
<Param name="minify-js">true</Param>
<Param name="minify-html">true</Param>
</Optional>
The feature supports the following parameters:
expires - The duration in seconds to force as minimum HTTP cache time for content fetched through the proxy via a rewritten URL. Default 86400
include-urls - A regex used to match URLs to rewrite to proxied form. Default .*
exclude-urls - A regex used to match URLs to exclude from rewriting. Processed after include-urls. Default is not to exclude any URL if not specifed
minify-css - Controls whether the container will attempt to minify css in style tags and referenced css files. Valid values are true|false. Container specific default
minify-js - Controls whether the container will attempt to minify JS in script tags and referenced JS files. Valid values are true|false. Container specific default
minify-html - Controls whether the container will attempt to minify HTML content. Valid values are true|false. Container specific default
Containers are free to perform additional optimizations when rewriting links including but not limited to:
- Extract @import directives from style tags and convert them into link tags in the head tag
- Merge multiple CSS fetches from successive link tags into one link tag that causes the proxy to concatenate the content fetched from the individual URLs
- Merge contiguous <srcipt src=xxx> tags into one concatenating proxy fetch
Containers are free to perform additional optimizations when rewriting links including but not limited to:
- Extract @import directives from style tags and convert them into link tags in the head tag
- Merge multiple CSS fetches from successive link tags into one link tag that causes the proxy to concatenate the content fetched from the individual URLs
- Merge contiguous <srcipt src=xxx> tags into one concatenating proxy fetchShould we suggest a convention for <Param>s that configure alternate rewriting behaviors as well?
--John
We should not support execution of user supplied regular expressions.
This creates a significant risk of DOS or worse. Would prefix
matching provide sufficient functionality?
Even perfect regular expression libraries have exponential worst-case run times.
Buggy regular expression libraries allow execution of arbitrary code.
For example:
http://www.securityfocus.com/bid/26346
http://www.cisco.com/en/US/products/products_security_response09186a00808bb91c.html
http://www.linuxdevcenter.com/pub/a/linux/2003/12/29/insecurities.html
> include-urls - A regex used to match URLs to rewrite to proxied form.We should not support execution of user supplied regular expressions.
> Default .*
>
> exclude-urls - A regex used to match URLs to exclude from rewriting.
> Processed after include-urls. Default is not to exclude any URL if not
> specifed
This creates a significant risk of DOS or worse. Would prefix
matching provide sufficient functionality?
Shell-style globbing still allows very bad worst-case runtimes, via
patterns that have multiple wild cards, e.g. *a*a*.
If we restrict to a single "*" wild card we can make the matching time
linear in the length of the string to be tested. That still permits
the patterns you gave above. (Implementation is also trivial, just
split on '*' and then do a prefix and suffix match. A pattern with no
* is a special case that is evaluated via string equality.)
And the spec would be “port this”?
Potential usability issue: since ‘include-urls’ and ‘exclude-urls’ can really only include one URL per entry, it will sense to make the term singular (include-url exclude-url). Reason: when scanning this, I first assumed comma-delimited list for the param. After actually reading this, I learned my scan was wrong.
One final issue: the include-url/exclude-url logic assumes case-sensitive matching. This is hard to get right and many developers wouldn’t know they got it wrong. This is an issue that hits experienced and inexperienced alike.
Suggestion: make case-insensitive matches the default. Update the feature to look like this:
<Optional feature="content-rewrite">
<Param name="case-sensitive-exclude-url">true</Param>
<!-- Exclude .GiF only ... -->
<Param name="exclude-url">.GiF</Param>
</Optional>
Possible alternative
<Optional feature="content-rewrite">
<!-- Exclude .GiF only ... -->
<Param name="exclude-url" caseSensitive="true">.GiF</Param>
</Optional>
The only cases, non-contrived, that I am aware of involve Unix deployments where /images and /Images are valid, side by side directory names. In other words, the non-contrived case is: Developer has a poorly laid out directory structure and needs to use the feature.
Given this sentiment, I’m happy with John Hjelmstad’s suggestion to make case sensitivity a global switch where it is either on or off and, by default, it is off.
The options are then
<Optional feature="content-rewrite">
<Param name="case-sensitive-exclude-url">true</Param>
<Param name="case-sensitive-include-url">true</Param>
<!-- Exclude .GiF only ... -->
<Param name="exclude-url">.GiF</Param>
</Optional>
Or
<Optional feature="content-rewrite">
<!-- One switch to enable/disable case-sensitivity on URLs -->
<Param name="case-sensitive-url">true</Param>
<!-- Exclude .GiF only ... -->
<Param name="exclude-url">.GiF</Param>
</Optional>
I’m in favor of fewer switches, so I’d prefer the last option. That said, if someone says “I have a use case that shows we really need to handle this on both”, I’ll happily accept your use case. To me, case-insensitive comparison by default is more important than a discussion about whether to have 1 or 2 switches.
+1
After some discussion internally, it sounds like EXPIRES may be problematic in this iteration. Does anyone object to making EXPIRES optional in the spec with behavior to ignore the parameter when it’s not supported?
As we refresh apps, and store content, we serve up new URLs, even if the developers use same URLs in their XML. So, the caching/refreshing happens automatically. Developers won’t need to worry about expiration timing.
I’m +1
+1
This is a better implementation. Suggest default is case insensitive
<Optional feature="content-rewrite">
<!-- Exclude .GiF only ... -->
<Param name="exclude-url" caseSensitive="true">.GiF</Param>
</Optional>