Replaces document cache with a per-module cache in CajaContentRewriter. (issue4184049)

2 views
Skip to first unread message

john...@gmail.com

unread,
Feb 24, 2011, 1:35:12 AM2/24/11
to mikes...@gmail.com, fa...@google.com, google-ca...@googlegroups.com, re...@codereview.appspotmail.com
Pumped to see this! Sorry to have lost track of it in my queue.


http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java
(right):

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java#newcode114
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java:114:
HtmlSerializer htmlSerializer, ProxyUriManager proxyUriManager) {
nit: tabs rather than spaces

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCache.java
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCache.java
(right):

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCache.java#newcode50
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCache.java:50:
return cloneJobs(cachedJobs);
no particular issue w/ this from Shindig's side, but it does seem
somewhat odd to mandate cloning of fetched Jobs here rather than
wherever they're used (ie. in CompilerPlugin and down the stack).
thoughts?

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java
(right):

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode46
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:46:
public ModuleCacheKeys asSingleton() {
nomenclature curiosity: is this a singleton?

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode60
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:60:
| ((hashBytes[3] & 0xff) << 24);
nit: could precompute this

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode116
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:116:
posInBuffer = hash(node.getNodeName(), posInBuffer);
won't avoidance of hashing the attribute's value mean that these hash
equivalently:
<Foo attrib="one"></Foo>
<Foo attrib="two"></Foo>
?

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode130
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:130:
flushBuffer(posInBuffer);
is this...

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode135
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:135:
flushBuffer(posInBuffer);
...and this necessary? Seems we could just flushBuffer() at the very end
of the method @141, since the underlying hash(...) methods
requireSpaceInBuffer(...), flushing as needed.

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode153
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:153:
buffer[++posInBuffer] = (byte) (n & 0xff);
random idea: the requireSpaceInBuffer/<update buffer> pattern could be
turned into a single method:
updateBuffer(int posInBuffer, byte... values) {
requireSpaceInBuffer(values.length, posInBuffer);
// update w/ all bytes etc
}

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java
(right):

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java#newcode44
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java:44:
if (!other.iterator().hasNext()) { return this; }
seems like a fairly expensive check given that the impl copies the Key
list. Consider checking for other instanceof ModuleCacheKeys, and if so
checking other.keys.size() == 0

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java#newcode65
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java:65:
return ImmutableList.<JobCache.Key>copyOf(keys).iterator();
just checking, is the copy required?

http://codereview.appspot.com/4184049/

mikes...@gmail.com

unread,
Feb 24, 2011, 11:50:07 AM2/24/11
to fa...@google.com, john...@gmail.com, google-ca...@googlegroups.com, re...@codereview.appspotmail.com
Reviewers: fargo, johnfargo,


http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java
(right):

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java#newcode114
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java:114:
HtmlSerializer htmlSerializer, ProxyUriManager proxyUriManager) {

On 2011/02/24 06:35:12, johnfargo wrote:
> nit: tabs rather than spaces

Done in three files.

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCache.java
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCache.java
(right):

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCache.java#newcode50
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCache.java:50:
return cloneJobs(cachedJobs);

On 2011/02/24 06:35:12, johnfargo wrote:
> no particular issue w/ this from Shindig's side, but it does seem
somewhat odd
> to mandate cloning of fetched Jobs here rather than wherever they're
used (ie.
> in CompilerPlugin and down the stack). thoughts?

This is protecting the invariant that no fetcher can affect what is seen
by a subsequent fetcher. Only a storer can affect what a fetcher sees.

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java
(right):

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode46
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:46:
public ModuleCacheKeys asSingleton() {

On 2011/02/24 06:35:12, johnfargo wrote:
> nomenclature curiosity: is this a singleton?

It's a singleton in the sense that java.util.Collections.singleton is a
singleton. A collection containing one item.

I don't think it's likely to be confused with the singleton modifier as
applied to a concrete type since there's no concrete type passed in
here.

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode60
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:60:
| ((hashBytes[3] & 0xff) << 24);

On 2011/02/24 06:35:12, johnfargo wrote:
> nit: could precompute this

Yep. I was thinking that a JIT would optimize that to a read of the
first 4 bytes of the array a la C:
union {
int n;
char[] bytes;
}
but realized that's making endianness assumptions.

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode116
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:116:
posInBuffer = hash(node.getNodeName(), posInBuffer);

On 2011/02/24 06:35:12, johnfargo wrote:
> won't avoidance of hashing the attribute's value mean that these hash
> equivalently:
> <Foo attrib="one"></Foo>
> <Foo attrib="two"></Foo>
> ?

Nope. This is captured in a test case:

74 public final void testKeysDifferBasedOnAttributeValue() throws
Exception {
75 assertKeysDiffer(key("<input type=text>"), key("<input
type=checkbox>"));
76 }

The little known DOM quirk that handles this is that each Attr has a
single Text node child that contains its value.

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode135
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:135:
flushBuffer(posInBuffer);

On 2011/02/24 06:35:12, johnfargo wrote:
> ...and this necessary? Seems we could just flushBuffer() at the very
end of the
> method @141, since the underlying hash(...) methods
requireSpaceInBuffer(...),
> flushing as needed.

posInBuffer is local. I'll give refactoring a shot.

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java#newcode153
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java:153:
buffer[++posInBuffer] = (byte) (n & 0xff);

On 2011/02/24 06:35:12, johnfargo wrote:
> random idea: the requireSpaceInBuffer/<update buffer> pattern could be
turned
> into a single method:
> updateBuffer(int posInBuffer, byte... values) {
> requireSpaceInBuffer(values.length, posInBuffer);
> // update w/ all bytes etc
> }

The ... requires an array allocation and copy. I don't know how well
this is optimized by the JIT.

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java
(right):

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java#newcode44
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java:44:
if (!other.iterator().hasNext()) { return this; }

On 2011/02/24 06:35:12, johnfargo wrote:
> seems like a fairly expensive check given that the impl copies the Key
list.
> Consider checking for other instanceof ModuleCacheKeys, and if so
checking
> other.keys.size() == 0

There isn't actually a copy. See comment below.
The semantics of union as specified require that it work with
JobCache.none().

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java#newcode65
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java:65:
return ImmutableList.<JobCache.Key>copyOf(keys).iterator();

On 2011/02/24 06:35:12, johnfargo wrote:
> just checking, is the copy required?

This is confusing. ImmutableList.copyOf does not copy when the argument
is an ImmutableList. Fixed.

From
http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/ImmutableList.html#copyOf(java.lang.Iterable)

Note: Despite what the method name suggests, if elements is an
ImmutableList, no copy will actually be performed, and the given list
itself will be returned.

Description:
This uses the existing object cache factory to create a cache from
module keys (implemented as MD5 hashes over parse trees) to module
content.

This requires an outstanding minor API change to PluginCompiler and
cannot be deployed until we push a new version of caja to maven.

Please review this at http://codereview.appspot.com/4184049/

Affected files:
M
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java
M
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/HttpRequestHandler.java
M
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java
A
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCache.java
A
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKey.java
A
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java
A
java/gadgets/src/test/java/org/apache/shindig/gadgets/servlet/ModuleCacheTest.java


mikes...@gmail.com

unread,
Feb 24, 2011, 11:57:02 AM2/24/11
to fa...@google.com, john...@gmail.com, google-ca...@googlegroups.com, re...@codereview.appspotmail.com

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java
(right):

http://codereview.appspot.com/4184049/diff/1001/java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java#newcode65


java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/ModuleCacheKeys.java:65:
return ImmutableList.<JobCache.Key>copyOf(keys).iterator();

On 2011/02/24 16:50:07, MikeSamuel wrote:
> On 2011/02/24 06:35:12, johnfargo wrote:
> > just checking, is the copy required?

> This is confusing. ImmutableList.copyOf does not copy when the
argument is an
> ImmutableList. Fixed.

> From

http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/ImmutableList.html#copyOf%28java.lang.Iterable)

> Note: Despite what the method name suggests, if elements is an
ImmutableList, no
> copy will actually be performed, and the given list itself will be
returned.

I remember why I did this. This is a cheap, type-safe way to return an
Iterator<JobCache.Key> given an Iterator<ModuleCacheKey>. The former is
required by the JobCache.Keys API. Since nothing can be added to an
ImmutableList there is no difference between an ImmutableList<T> and an
ImmutableList<? super T>.

http://codereview.appspot.com/4184049/

John Hjelmstad

unread,
Feb 24, 2011, 1:05:45 PM2/24/11
to mikes...@gmail.com, fa...@google.com, john...@gmail.com, google-ca...@googlegroups.com, re...@codereview.appspotmail.com
Thanks for the quick reply, as well as informative explanation about the various at-best-semi-obvious bits here ;) Latest patch LGTM, committing shortly...

John Hjelmstad

unread,
Feb 24, 2011, 1:28:58 PM2/24/11
to mikes...@gmail.com, fa...@google.com, john...@gmail.com, google-ca...@googlegroups.com, re...@codereview.appspotmail.com
Correction, I'll wait until the Maven repo for Caja has been updated w/ the necessary API changes :)

Mike Samuel

unread,
Feb 24, 2011, 2:58:44 PM2/24/11
to John Hjelmstad, fa...@google.com, google-ca...@googlegroups.com, re...@codereview.appspotmail.com
2011/2/24 John Hjelmstad <john...@gmail.com>:

> Correction, I'll wait until the Maven repo for Caja has been updated w/ the
> necessary API changes :)

Yep. I'll get on that.

john...@gmail.com

unread,
Feb 25, 2011, 3:35:57 PM2/25/11
to mikes...@gmail.com, fa...@google.com, google-ca...@googlegroups.com, re...@codereview.appspotmail.com
Updated patch w/ new Caja version in pom.xml, all builds and tests fine.
Committed as r1074690.


http://codereview.appspot.com/4184049/

Reply all
Reply to author
Forward
0 new messages