Added:
changes/mikesamuel/malformed-html-20-Dec-2007/
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/
- copied from r274, /trunk/
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/HtmlTextEscapingMode.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/AbstractElementStack.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/DomParserMessageType.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/Html5ElementStack.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/IllegalDocumentStateException.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/OpenElementStack.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/XmlElementStack.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/util/Join.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexergolden2.txt
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexerinput2.xml
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/util/JoinTest.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/util/MoreAsserts.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/third_party/java/htmlparser/
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/third_party/java/htmlparser/LICENSE.txt
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/third_party/java/htmlparser/README.txt
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/third_party/java/htmlparser/htmlparser.jar
(contents, props changed)
Modified:
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/build.xml
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/CssLexer.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/HtmlLexer.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/InputElementJoiner.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/Token.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/opensocial/DefaultGadgetRewriter.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/AbstractParseTreeNode.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/css/CssPropertySignature.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/DomParser.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/DomTree.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/ExpressionSanitizerCaja.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/GxpCompiler.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/HtmlPluginCompiler.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/HtmlPluginCompilerMain.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/AllTests.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/HtmlLexerTest.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexergolden1.txt
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexerinput1.html
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/opensocial/example-rewritten.xml
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/css/CssParserTest.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/html/DomParserTest.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/js/ParserTest.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/js/rendergolden1.txt
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/quasiliteral/DefaultRewriterTest.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/quasiliteral/MatchTest.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/plugin/CompiledPluginTest.java
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/util/RhinoTestBed.java
Log:
Rewrote HTML parser to convert malformed markup into a valid parse tree.
Jason and Kunal have a requirement that they be able to handle
malformed markup and are currently reimplementing event handler
extraction and other bits of HtmlPluginCompiler so that they can
hand us just javascript.
This change is meant to get us to the point where we can take
malformed markup as input and coerce it to the same parse tree
a browser would see.
It will also make it easier security-wise. HTML doesn't guarantee
that if an id only shows up on one node, only one node in the
resulting DOM will have that id. We can now coerce to XHTML which
has that property.
I suggest you review in this order:
# New parser changes
java/.../parser/html/DomParser.java
java/.../parser/html/OpenElementStack.java
java/.../parser/html/XmlElementStack.java
javatests/.../parser/html/DomParserTest.java
java/.../parser/html/Html5ElementStack.java
# Changes to rendering of HTML doms
java/.../parser/DomTree.java
java/.../lexer/HtmlLexer.java
java/.../lexer/HtmlTextEscapingMode.java
# Everything else
This adds a third-party dependency on libhtmlparser available
at http://about.validator.nu/htmlparser/ which is available
under the MPL. I downloaded the most recent version, 1.5.1.
It implements HTML5's (
http://www.whatwg.org/specs/web-apps/current-work/ )
parsing rules which attempt to codify the behavior of existing browsers
when parsing HTML.
Changes:
- Updated HtmlLexer to properly distinguish RCDATA and CDATA.
RCDATA is the content type for <title> and <textarea> tags
which interpret special characters literally, but which
treat entities as entities. Fixed lexer to not recognize
CDATA sections when parsing HTML.
- Modified DomParser to canonicalize element and attribute names.
Existing code tries to normalize case before comparing element
names to whitelists, but this is error-prone (and incorrect for
XHTML), so I now canonicalize at construction time.
- Rewrote DomTree.render to properly render unary HTML tags.
Before we output <br></br> for a single <br> tag, but browsers
actually treat a </br> token as a <br> token.
- Defined OpenElementStack, an interface used by DomParser to
build the tree and implemented a trivial one for XML parsing.
- Defined Html5ElementStack which bridges OpenElementStack to
libhtmlparser. This uses our lexer, and parse tree implementation
and preserves file position info for both nodes and error messages.
- fixed bug: NPE on link tags in gadget specs that specify the
media attribute but not type.
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/build.xml
==============================================================================
--- /trunk/src/build.xml (original)
+++ changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/build.xml
Thu Dec 20 08:56:04 2007
@@ -52,6 +52,7 @@
<path id="classpath.compile">
<pathelement path="${third_party}/jakarta_commons/commons-cli.jar"/>
+ <pathelement path="${third_party}/htmlparser/htmlparser.jar"/>
</path>
<path id="classpath.run">
@@ -67,6 +68,7 @@
<pathelement path="${third_party}/junit/junit.jar"/>
<pathelement path="${third_party}/rhino/js.jar"/>
<pathelement path="${third_party}/emma/emma.jar"/>
+ <pathelement path="${third_party}/htmlparser/htmlparser.jar"/>
<pathelement path="${instr}/classes"/>
<!-- In case instrumentation not enabled -->
<pathelement path="${lib}"/>
@@ -201,7 +203,7 @@
<include name="**/caja/plugin/PluginCompilerTest.java"/>
<include name="**/caja/plugin/UrlUtilTest.java"/>
<include name="**/caja/plugin/caps/CapabilityRewriterTest.java"/>
- <include name="**/caja/util/PipelineTest.java"/>
+ <include name="**/caja/util/JoinTest.java"/>
<include name="**/caja/util/SparseBitSetTest.java"/>
<!-- compilerarg line="-Xlint:unchecked"/ -->
</javac>
@@ -210,7 +212,9 @@
<include name="**/caja/lexer/csslexergolden1.txt"/>
<include name="**/caja/lexer/csslexerinput1.css"/>
<include name="**/caja/lexer/htmllexergolden1.txt"/>
+ <include name="**/caja/lexer/htmllexergolden2.txt"/>
<include name="**/caja/lexer/htmllexerinput1.html"/>
+ <include name="**/caja/lexer/htmllexerinput2.xml"/>
<include name="**/caja/lexer/lexergolden1.txt"/>
<include name="**/caja/lexer/lexergolden2.txt"/>
<include name="**/caja/lexer/lexertest1.js"/>
@@ -246,6 +250,9 @@
<include name="**/caja/parser/js/rendergolden4.txt"/>
<include name="**/caja/parser/js/rendergolden5.txt"/>
<include name="**/caja/parser/js/rendergolden6.txt"/>
+ <include name="**/caja/parser/quasiliteral/clickme.js"/>
+ <include name="**/caja/parser/quasiliteral/function.js"/>
+ <include name="**/caja/parser/quasiliteral/listfriends.js"/>
<include name="**/caja/plugin/asserts.js"/>
<include name="**/caja/plugin/browser-stubs.js"/>
<include name="**/caja/plugin/gxpcompilergolden1.js"/>
@@ -305,8 +312,9 @@
<javac destdir="${lib}" debug="true" target="1.5" source="1.5">
<src path="${testsrc}"/>
<classpath refid="classpath.tests.compile"/>
- <include name="**/caja/util/TestUtil.java"/>
+ <include name="**/caja/util/MoreAsserts.java"/>
<include name="**/caja/util/RhinoTestBed.java"/>
+ <include name="**/caja/util/TestUtil.java"/>
<!-- compilerarg line="-Xlint:unchecked"/ -->
</javac>
</target>
@@ -494,6 +502,7 @@
<include name="**/caja/lexer/CssLexer.java"/>
<include name="**/caja/lexer/CssTokenType.java"/>
<include name="**/caja/lexer/HtmlLexer.java"/>
+ <include name="**/caja/lexer/HtmlTextEscapingMode.java"/>
<include name="**/caja/lexer/HtmlTokenType.java"/>
<include name="**/caja/lexer/InputElementJoiner.java"/>
<include name="**/caja/lexer/InputElementSplitter.java"/>
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/CssLexer.java
==============================================================================
--- /trunk/src/java/com/google/caja/lexer/CssLexer.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/CssLexer.java
Thu Dec 20 08:56:04 2007
@@ -122,7 +122,7 @@
FilePosition fp
= FilePosition.span(pending.getFirst().pos, pending.getLast().pos);
pending.clear();
- pending.add(new Token<CssTokenType>(sb.toString(), type, fp));
+ pending.add(Token.instance(sb.toString(), type, fp));
}
/**
@@ -446,7 +446,7 @@
cp.getCurrentPosition(epos);
assert sb.length() > 0
: "ch=" + ch + " : " + chi + " : " + spos + " : " + type;
- pending = new Token<CssTokenType>(sb.toString(), type,
+ pending = Token.instance(sb.toString(), type,
FilePosition.instance(
spos.source,
spos.lineNo, spos.lineNo, spos.charInFile, spos.charInLine,
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/HtmlLexer.java
==============================================================================
--- /trunk/src/java/com/google/caja/lexer/HtmlLexer.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/HtmlLexer.java
Thu Dec 20 08:56:04 2007
@@ -18,8 +18,7 @@
import com.google.caja.reporting.MessageType;
import java.io.IOException;
-import java.util.HashSet;
-import java.util.Set;
+import java.util.Locale;
/**
* A flexible lexer for html, gxp, and related document types.
@@ -198,6 +197,8 @@
*/
private String escapeExemptTagName = null;
+ private HtmlTextEscapingMode textEscapingMode;
+
public HtmlInputSplitter(CharProducer p) { this.p = p; }
/**
@@ -228,20 +229,29 @@
// reclassify as UNESCAPED, any tokens that appear in the middle.
if (inEscapeExemptBlock) {
if (token.type == HtmlTokenType.TAGBEGIN && '/' == token.text.charAt(1)
+ && textEscapingMode != HtmlTextEscapingMode.PLAIN_TEXT
&& canonTagName(token.text.substring(2)).equals(escapeExemptTagName)
) {
- inEscapeExemptBlock = false;
- escapeExemptTagName = null;
+ this.inEscapeExemptBlock = false;
+ this.escapeExemptTagName = null;
+ this.textEscapingMode = null;
} else if (token.type != HtmlTokenType.SERVERCODE) {
- token = reclassify(token, HtmlTokenType.UNESCAPED);
+ // classify RCDATA as text since it can contain entities
+ token = reclassify(
+ token, (this.textEscapingMode == HtmlTextEscapingMode.RCDATA
+ ? HtmlTokenType.TEXT
+ : HtmlTokenType.UNESCAPED));
}
- } else {
+ } else if (!asXml) {
switch (token.type) {
case TAGBEGIN:
{
- String tagName = token.text.substring(1);
- if (this.isEscapeExemptTagName(tagName)) {
- this.escapeExemptTagName = canonTagName(tagName);
+ String canonTagName = canonTagName(token.text.substring(1));
+ if (HtmlTextEscapingMode
+ .tagFollowedByLiteralContent(canonTagName)) {
+ this.escapeExemptTagName = canonTagName;
+ this.textEscapingMode
+ = HtmlTextEscapingMode.getModeForTag(canonTagName);
}
break;
}
@@ -422,7 +432,7 @@
}
break;
case BANG:
- if ('[' == ch) {
+ if ('[' == ch && asXml) {
state = State.CDATA;
} else if ('-' == ch) {
state = State.BANG_DASH;
@@ -564,21 +574,8 @@
}
}
- private static final Set<String> ESCAPE_EXEMPT_TAGS = new HashSet<String>();
- static {
- ESCAPE_EXEMPT_TAGS.add("listing");
- ESCAPE_EXEMPT_TAGS.add("script");
- ESCAPE_EXEMPT_TAGS.add("style");
- ESCAPE_EXEMPT_TAGS.add("textarea");
- ESCAPE_EXEMPT_TAGS.add("xmp");
- }
-
- protected boolean isEscapeExemptTagName(String tagName) {
- return !asXml && ESCAPE_EXEMPT_TAGS.contains(canonTagName(tagName));
- }
-
protected String canonTagName(String tagName) {
- return asXml ? tagName : tagName.toLowerCase();
+ return asXml ? tagName : tagName.toLowerCase(Locale.ENGLISH);
}
static <T extends TokenType>
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/HtmlTextEscapingMode.java
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/HtmlTextEscapingMode.java
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,136 @@
+// Copyright (C) 2005 Google Inc.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.google.caja.lexer;
+
+import java.util.HashMap;
+import java.util.Locale;
+import java.util.Map;
+
+/**
+ * From section 8.1.2.6 of http://www.whatwg.org/specs/web-apps/current-work/
+ * <p>
+ * The text in CDATA and RCDATA elements must not contain any
+ * occurences of the string "</" (U+003C LESS-THAN SIGN, U+002F
+ * SOLIDUS) followed by characters that case-insensitively match the
+ * tag name of the element followed by one of U+0009 CHARACTER
+ * TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
+ * FORM FEED (FF), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or
+ * U+002F SOLIDUS (/), unless that string is part of an escaping
+ * text span.
+ * </p>
+ *
+ * <p>
+ * See also
+ * http://www.whatwg.org/specs/web-apps/current-work/#cdata-rcdata-restrictions
+ * for the elements which fall in each category.
+ * </p>
+ *
+ * @author mikes...@gmail.com
+ */
+public enum HtmlTextEscapingMode {
+ /**
+ * Normally escaped character data that breaks around comments and tags.
+ */
+ PCDATA,
+ /**
+ * A span of text where HTML special characters are interpreted literally,
+ * as in a SCRIPT tag.
+ */
+ CDATA,
+ /**
+ * A span of text and character entity references where HTML special
+ * characters are interpreted literally, as in a TITLE tag.
+ */
+ RCDATA,
+ /**
+ * A span of text where HTML special characters are interpreted literally,
+ * where there is no end tag. PLAIN_TEXT runs until the end of the file.
+ */
+ PLAIN_TEXT,
+
+ /**
+ * Cannot contain data.
+ */
+ VOID,
+ ;
+
+ private static final Map<String, HtmlTextEscapingMode> ESCAPE_EXEMPT_TAGS
+ = new HashMap<String, HtmlTextEscapingMode>();
+ static {
+ ESCAPE_EXEMPT_TAGS.put("iframe", CDATA);
+ // HTML5 does not treat listing as CDATA, but HTML2 does
+ // at http://www.w3.org/MarkUp/1995-archive/NonStandard.html
+ // Listing is not supported by browsers.
+ //ESCAPE_EXEMPT_TAGS.put("listing", CDATA);
+
+ // Technically, only if embeds, frames, and scripts, respectively, are
+ // enabled.
+ ESCAPE_EXEMPT_TAGS.put("noembed", CDATA);
+ ESCAPE_EXEMPT_TAGS.put("noframes", CDATA);
+ ESCAPE_EXEMPT_TAGS.put("noscript", CDATA);
+
+ // Runs till end of file.
+ ESCAPE_EXEMPT_TAGS.put("plaintext", PLAIN_TEXT);
+
+ ESCAPE_EXEMPT_TAGS.put("script", CDATA);
+ ESCAPE_EXEMPT_TAGS.put("style", CDATA);
+
+ // Textarea and Title are RCDATA, not CDATA, so decode entity references.
+ ESCAPE_EXEMPT_TAGS.put("textarea", RCDATA);
+ ESCAPE_EXEMPT_TAGS.put("title", RCDATA);
+
+ ESCAPE_EXEMPT_TAGS.put("xmp", CDATA);
+
+ // Nodes that can't contain content.
+ ESCAPE_EXEMPT_TAGS.put("base", VOID);
+ ESCAPE_EXEMPT_TAGS.put("link", VOID);
+ ESCAPE_EXEMPT_TAGS.put("meta", VOID);
+ ESCAPE_EXEMPT_TAGS.put("hr", VOID);
+ ESCAPE_EXEMPT_TAGS.put("br", VOID);
+ ESCAPE_EXEMPT_TAGS.put("img", VOID);
+ ESCAPE_EXEMPT_TAGS.put("embed", VOID);
+ ESCAPE_EXEMPT_TAGS.put("param", VOID);
+ ESCAPE_EXEMPT_TAGS.put("area", VOID);
+ ESCAPE_EXEMPT_TAGS.put("col", VOID);
+ ESCAPE_EXEMPT_TAGS.put("input", VOID);
+ }
+
+ /**
+ * The mode used for content following a start tag with the given name.
+ */
+ public static HtmlTextEscapingMode getModeForTag(String
canonTagName) {
+ assert canonTagName.toLowerCase(Locale.ENGLISH).equals(canonTagName);
+ HtmlTextEscapingMode mode = ESCAPE_EXEMPT_TAGS.get(canonTagName);
+ return mode != null ? mode : PCDATA;
+ }
+
+ /**
+ * True if content immediately following the start tag must be
treated as
+ * special CDATA so that less-thans are not treated as starting
tags, comments
+ * or directives.
+ */
+ public static boolean tagFollowedByLiteralContent(String tagName) {
+ HtmlTextEscapingMode mode = getModeForTag(tagName);
+ return mode != PCDATA && mode != VOID;
+ }
+
+ /**
+ * True iff the tag cannot contain any content -- will an HTML
parser consider
+ * the element to have ended immediately after the start tag.
+ */
+ public static boolean isVoidElement(String tagName) {
+ return getModeForTag(tagName) == VOID;
+ }
+}
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/InputElementJoiner.java
==============================================================================
--- /trunk/src/java/com/google/caja/lexer/InputElementJoiner.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/InputElementJoiner.java
Thu Dec 20 08:56:04 2007
@@ -187,7 +187,7 @@
private static Token<JsTokenType> combine(
Token<JsTokenType> a, Token<JsTokenType> b, JsTokenType type) {
- return new Token<JsTokenType>(
+ return Token.instance(
a.text + b.text, type, FilePosition.span(a.pos, b.pos));
}
}
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/Token.java
==============================================================================
--- /trunk/src/java/com/google/caja/lexer/Token.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/lexer/Token.java
Thu Dec 20 08:56:04 2007
@@ -29,7 +29,7 @@
return new Token<TT>(text, type, pos);
}
- public Token(String text, T type, FilePosition pos) {
+ private Token(String text, T type, FilePosition pos) {
this.text = text;
this.type = type;
this.pos = pos;
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/opensocial/DefaultGadgetRewriter.java
==============================================================================
---
/trunk/src/java/com/google/caja/opensocial/DefaultGadgetRewriter.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/opensocial/DefaultGadgetRewriter.java
Thu Dec 20 08:56:04 2007
@@ -23,6 +23,7 @@
import com.google.caja.parser.AncestorChain;
import com.google.caja.parser.html.DomParser;
import com.google.caja.parser.html.DomTree;
+import com.google.caja.parser.html.OpenElementStack;
import com.google.caja.plugin.HtmlPluginCompiler;
import com.google.caja.plugin.PluginMeta;
import com.google.caja.reporting.MessageQueue;
@@ -106,7 +107,8 @@
TokenQueue<HtmlTokenType> tq = DomParser.makeTokenQueue(
is, new StringReader(htmlContent), false);
if (tq.isEmpty()) { return null; }
- DomTree.Fragment contentTree = DomParser.parseFragment(tq);
+ DomTree.Fragment contentTree = DomParser.parseFragment(
+ tq, OpenElementStack.Factory.createHtml5ElementStack(mq));
if (contentTree == null) {
mq.addMessage(OpenSocialMessageType.NO_CONTENT, is);
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/AbstractParseTreeNode.java
==============================================================================
--- /trunk/src/java/com/google/caja/parser/AbstractParseTreeNode.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/AbstractParseTreeNode.java
Thu Dec 20 08:56:04 2007
@@ -237,7 +237,6 @@
boolean result = true;
// This loop is complicated because it needs to survive mutations
to the
// child list.
- int n = this.children.size();
List<? extends ParseTreeNode> childrenCache = this.children;
ParseTreeNode next = childrenCache.get(0);
@@ -430,8 +429,6 @@
@Override
boolean apply(boolean copied) {
- final AbstractParseTreeNode<T> owner = AbstractParseTreeNode.this;
-
if (!copied) { copyOnWrite(); }
// Find where to insert
@@ -455,7 +452,6 @@
@Override
void rollback() {
- final AbstractParseTreeNode<T> owner = AbstractParseTreeNode.this;
int childIndex = backupIndex;
// This check corresponds to the replacement.parent == null
check in apply
@@ -492,8 +488,6 @@
@Override
void rollback() {
- final AbstractParseTreeNode<T> owner = AbstractParseTreeNode.this;
-
if (children.contains(toRemove)) { return; }
addChild(backupIndex, toRemove);
@@ -511,8 +505,6 @@
@Override
boolean apply(boolean copied) {
- final AbstractParseTreeNode<T> owner = AbstractParseTreeNode.this;
-
// Find where to insert
int childIndex;
if (null == before) {
@@ -537,8 +529,6 @@
@Override
void rollback() {
- final AbstractParseTreeNode<T> owner = AbstractParseTreeNode.this;
-
int childIndex = backupIndex;
ParseTreeNode removed = children.remove(childIndex);
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/css/CssPropertySignature.java
==============================================================================
---
/trunk/src/java/com/google/caja/parser/css/CssPropertySignature.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/css/CssPropertySignature.java
Thu Dec 20 08:56:04 2007
@@ -65,8 +65,7 @@
@Override
public CssPropertySignature clone() {
- return (CssPropertySignature)
- ParseTreeNodes.newNodeInstance(getClass(), getValue(), children());
+ return ParseTreeNodes.newNodeInstance(getClass(), getValue(), children());
}
/** A signature that can be repeated zero or more times. */
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/AbstractElementStack.java
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/AbstractElementStack.java
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,146 @@
+// Copyright (C) 2007 Google Inc.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.google.caja.parser.html;
+
+import com.google.caja.lexer.FilePosition;
+import com.google.caja.lexer.HtmlTokenType;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Abstract base class for OpenElementStack implementations that
maintains the
+ * open element stack as the tree is built around it.
+ *
+ * @author mikes...@gmail.com
+ */
+abstract class AbstractElementStack implements OpenElementStack {
+ protected static final boolean DEBUG = false;
+ private DomTree.Fragment rootElement = new DomTree.Fragment();
+ /**
+ * A list of open elements.
+ */
+ private final List<DomTree> openElements = new ArrayList<DomTree>();
+
+ {
+ openElements.add(rootElement);
+ }
+
+ AbstractElementStack() {}
+
+ /** @inheritDoc */
+ public final DomTree.Fragment getRootElement() {
+ return rootElement;
+ }
+
+ /** @inheritDoc */
+ public void open(boolean fragment) {}
+
+ /** The current element — according to HTML5 the stack grows
down. */
+ protected final DomTree getBottomElement() {
+ return openElements.get(openElements.size() - 1);
+ }
+
+ /** The count of open elements. */
+ protected final int getNOpenElements() {
+ return openElements.size();
+ }
+
+ /** The index-th open element counting from 0 at the root. */
+ protected final DomTree.Tag getElement(int index) {
+ assert index > 0 : "" + index;
+ return (DomTree.Tag) openElements.get(index);
+ }
+
+ /**
+ * Adds an element to the element stack, puts it on the previous head's
+ * child list, and updates file positions.
+ *
+ * @param canonicalTagName a canonical tag name per
+ * {@link OpenElementStack#canonicalizeElementName} that is used
as el's
+ * value.
+ */
+ protected final void push(DomTree.Tag el, String canonicalTagName) {
+ if (DEBUG) System.err.println("push(" + el + ", " +
canonicalTagName + ")");
+ el.setTagName(canonicalTagName);
+ DomTree parent = getBottomElement();
+ openElements.add(el);
+ if (rootElement.getFilePosition() == null) {
+ rootElement.setFilePosition(el.getFilePosition());
+ }
+ doAppend(el, parent);
+ }
+
+ /**
+ * Append a node to the DOM tree as the child of the bottom.
+ * This may be overridden by subclasses if they wish to add at a different
+ * location.
+ */
+ protected void doAppend(DomTree el, DomTree parent) {
+ parent.insertBefore(el, null);
+ }
+
+ /**
+ * Pop the N bottom levels of the open element stack.
+ * @param endPos the position at which the popped elements should be
+ * considered to end.
+ */
+ protected final void popN(int n, FilePosition endPos) {
+ if (DEBUG) System.err.println("popN(" + n + ", " + endPos + ")");
+ while (--n >= 0) {
+ int top = openElements.size() - 1;
+ DomTree node = openElements.remove(top);
+ node.setFilePosition(FilePosition.span(node.getFilePosition(), endPos));
+ if (openElements.size() == 1) {
+ FilePosition rootPos = rootElement.getFilePosition();
+ if (rootPos.endCharInFile() <= 1) {
+ rootPos = rootElement.children().get(0).getFilePosition();
+ }
+ rootElement.setFilePosition(FilePosition.span(rootPos, endPos));
+ break;
+ }
+ }
+ }
+
+ /** Strip ignorable whitespace nodes from the root. */
+ protected void stripIgnorableText() {
+ if (rootElement.children().isEmpty()) { return; }
+
+ // No need to loop because processText normalizes.
+ DomTree firstChild = rootElement.children().get(0);
+ if (isIgnorableTextNode(firstChild)) {
+ rootElement.removeChild(firstChild);
+
+ if (rootElement.children().isEmpty()) { return; }
+ }
+
+ // No need to loop because processText normalizes.
+ DomTree lastChild = rootElement.children().get(
+ rootElement.children().size() - 1);
+ if (isIgnorableTextNode(lastChild)) {
+ rootElement.removeChild(lastChild);
+ }
+ }
+
+ /**
+ * @see <a href="http://www.w3.org/TR/REC-xml/#sec-white-space">ignorable
+ * white space</a>
+ */
+ private static boolean isIgnorableTextNode(DomTree t) {
+ // TODO(mikesamuel): check against XML&HTML definitions of whitespace.
+ // Note: CDATA and ESCAPED text purposefully not treated as whitespace.
+ return t instanceof DomTree.Text && t.getToken().type == HtmlTokenType.TEXT
+ && "".equals(t.getToken().text.trim());
+ }
+}
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/DomParser.java
==============================================================================
--- /trunk/src/java/com/google/caja/parser/html/DomParser.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/DomParser.java
Thu Dec 20 08:56:04 2007
@@ -33,21 +33,49 @@
/**
* Parses a {@link DomTree} from a stream of xml tokens.
- * This is a tolerant, non-validating, parser, but does require that tags
- * be balanced, as in XML. Since it's not validating, we don't bother
to parse
- * DTDs, and so do not process external entities.
+ * This is a non-validating parser, that, will parse tolerantly when created
+ * with an
+ * {@link OpenElementStack.Factory#createHtml5ElementStack
HtmlElementStack}, or
+ * will require balanced tags when created with an
+ * {@link OpenElementStack.Factory#createXmlElementStack XmlElementStack}.
+ * <p>
+ * Since it's not validating, we don't bother to parse DTDs, and so do not
+ * process external entities. Parsing will not cause URI resolution or
+ * fetching.
*
* @author mikes...@gmail.com
*/
public final class DomParser {
-
public static DomTree parseDocument(TokenQueue<HtmlTokenType> tokens)
throws ParseException {
- ignoreTopLevelIgnorables(tokens);
- DomTree doc = parseDom(tokens);
- ignoreTopLevelIgnorables(tokens);
- tokens.expectEmpty();
- return doc;
+ return parseDocument(
+ tokens, OpenElementStack.Factory.createXmlElementStack());
+ }
+
+ public static DomTree parseDocument(
+ TokenQueue<HtmlTokenType> tokens, OpenElementStack elementStack)
+ throws ParseException {
+ // Make sure the elementStack is empty.
+ elementStack.open(false);
+
+ do {
+ parseDom(tokens, elementStack);
+ } while (!tokens.isEmpty());
+
+ FilePosition endPos = FilePosition.endOf(tokens.lastPosition());
+ try {
+ elementStack.finish(endPos);
+ } catch (IllegalDocumentStateException ex) {
+ throw new ParseException(ex.getCajaMessage(), ex);
+ }
+
+ DomTree root = elementStack.getRootElement();
+ System.err.println("root=" + root);
+ if (root.children().isEmpty()) {
+ throw new ParseException(new Message(
+ DomParserMessageType.MISSING_DOCUMENT_ELEMENT, endPos));
+ }
+ return root.children().get(0);
}
/**
@@ -55,11 +83,43 @@
*/
public static DomTree.Fragment
parseFragment(TokenQueue<HtmlTokenType> tokens)
throws ParseException {
- List<DomTree> topLevelNodes = new ArrayList<DomTree>();
- do {
- topLevelNodes.add(parseDom(tokens));
- } while (!tokens.isEmpty());
- return new DomTree.Fragment(topLevelNodes);
+ return parseFragment(
+ tokens, OpenElementStack.Factory.createXmlElementStack());
+ }
+
+ /**
+ * Parses a snippet of markup.
+ */
+ public static DomTree.Fragment parseFragment(
+ TokenQueue<HtmlTokenType> tokens, OpenElementStack elementStack)
+ throws ParseException {
+ // Make sure the elementStack is empty.
+ elementStack.open(true);
+
+ while (!tokens.isEmpty()) {
+ // Skip over top level comments, and whitespace only text nodes.
+ // Whitespace is significant for XML unless the schema specifies
+ // otherwise, but whitespace outside the root element is not.
There is
+ // one exception for whitespace preceding the prologue.
+ Token<HtmlTokenType> t = tokens.peek();
+
+ if (HtmlTokenType.COMMENT == t.type
+ || HtmlTokenType.IGNORABLE == t.type) {
+ tokens.advance();
+ continue;
+ }
+
+ parseDom(tokens, elementStack);
+ }
+
+ FilePosition endPos = FilePosition.endOf(tokens.lastPosition());
+ try {
+ elementStack.finish(endPos);
+ } catch (IllegalDocumentStateException ex) {
+ throw new ParseException(ex.getCajaMessage(), ex);
+ }
+
+ return elementStack.getRootElement();
}
/**
@@ -84,52 +144,51 @@
}
/**
- * Skip over top level comments, and whitespace only text nodes.
- * Whitespace is significant for XML unless the schema specifies otherwise,
- * but whitespace outside the root element is not. There is one exception
- * for whitespace preceding the prologue.
- */
- private static void
ignoreTopLevelIgnorables(TokenQueue<HtmlTokenType> tokens)
- throws ParseException {
- while (!tokens.isEmpty()) {
- Token<HtmlTokenType> t = tokens.peek();
-
- if (!(HtmlTokenType.IGNORABLE == t.type || HtmlTokenType.COMMENT
== t.type
- || (HtmlTokenType.TEXT == t.type
&& "".equals(t.text.trim())))) {
- break;
- }
- tokens.advance();
- }
- }
-
- /**
* Parses a single top level construct, an element, or a text chunk
from the
* given queue.
* @throws ParseException if elements are unbalanced -- sgml instead
of xml
* attributes are missing values, or there is no top level
construct to
* parse, or if there is a problem parsing the underlying stream.
*/
- private static DomTree parseDom(TokenQueue<HtmlTokenType> tokens)
+ private static void parseDom(TokenQueue<HtmlTokenType> tokens,
+ OpenElementStack out)
throws ParseException {
while (true) {
Token<HtmlTokenType> t = tokens.pop();
- Token<HtmlTokenType> end = t;
switch (t.type) {
case TAGBEGIN:
- if (isClose(t)) {
- throw new ParseException(new Message(
- MessageType.MALFORMED_XHTML, t.pos,
- MessagePart.Factory.valueOf(t.text)));
+ {
+ List<DomTree.Attrib> attribs;
+ Token<HtmlTokenType> end;
+ if (isClose(t)) {
+ attribs = Collections.<DomTree.Attrib>emptyList();
+ do {
+ // TODO(mikesamuel): if this is not a tagend, then we should
+ // require ignorable whitespace when we're parsing strictly.
+ end = tokens.pop();
+ } while (end.type != HtmlTokenType.TAGEND);
+ } else {
+ attribs = new ArrayList<DomTree.Attrib>();
+ end = parseTagAttributes(tokens, attribs);
+
+ for (DomTree.Attrib attrib : attribs) {
+ attrib.setAttribName(
+ out.canonicalizeAttributeName(attrib.getAttribName()));
+ }
+ }
+ attribs = Collections.unmodifiableList(attribs);
+ try {
+ out.processTag(t, end, attribs);
+ } catch (IllegalDocumentStateException ex) {
+ throw new ParseException(ex.getCajaMessage(), ex);
+ }
}
- List<DomTree> children = new ArrayList<DomTree>();
- end = parseElement(t, tokens, children);
- children = Collections.unmodifiableList(normalize(children));
- return new DomTree.Tag(children, t, end);
+ return;
case CDATA:
- return new DomTree.CData(t);
case TEXT:
case UNESCAPED:
- return new DomTree.Text(t);
+ out.processText(t);
+ return;
case COMMENT:
continue;
default:
@@ -141,11 +200,12 @@
}
/**
- * Parses an element from a balanced start and end tag or a unary tag.
+ * Parses attributes onto children and consumes and returns the end
of tag
+ * token.
*/
- private static Token<HtmlTokenType> parseElement(
- Token<HtmlTokenType> start, TokenQueue<HtmlTokenType> tokens,
- List<DomTree> children)
+ private static Token<HtmlTokenType> parseTagAttributes(
+ TokenQueue<HtmlTokenType> tokens,
+ List<? super DomTree.Attrib> children)
throws ParseException {
Token<HtmlTokenType> last;
tokloop:
@@ -164,30 +224,6 @@
MessagePart.Factory.valueOf(last.text)));
}
}
- if (">".equals(last.text)) { // look for an end tag
- String tagName = tagName(start);
- while (true) {
- last = tokens.peek();
- if (last.type == HtmlTokenType.TAGBEGIN && isClose(last)) {
- if (!tagName.equals(tagName(last))) {
- throw new ParseException(new Message(
- MessageType.MISSING_ENDTAG, last.pos,
- MessagePart.Factory.valueOf("<" + tagName + ">"),
- MessagePart.Factory.valueOf(last.text + ">")));
- }
- tokens.advance();
- // Consume ignorable whitespace until we see a tagend
- while (tokens.peek().type == HtmlTokenType.IGNORABLE) {
- tokens.advance();
- }
- last = tokens.peek();
- tokens.expectToken(">");
- break;
- } else {
- children.add(parseDom(tokens));
- }
- }
- }
return last;
}
@@ -195,10 +231,12 @@
* Parses an element from a token stream.
* @param tokens a token queue whose head is a {HtmlTokenType#ATTRNAME}
*/
- private static DomTree parseAttrib(TokenQueue<HtmlTokenType> tokens)
+ private static DomTree.Attrib parseAttrib(TokenQueue<HtmlTokenType> tokens)
throws ParseException {
Token<HtmlTokenType> name = tokens.pop();
Token<HtmlTokenType> value = tokens.pop();
+ // TODO(mikesamuel): make sure that the XmlElementStack does not allow
+ // valueless attributes, and allow them here.
if (value.type != HtmlTokenType.ATTRVALUE) {
throw new ParseException(
new Message(MessageType.MALFORMED_XHTML,
@@ -213,39 +251,6 @@
*/
private static boolean isClose(Token<HtmlTokenType> t) {
return t.text.startsWith("</");
- }
-
- /** Extracts the tag name from a {@link HtmlTokenType#TAGBEGIN}
token. */
- private static String tagName(Token<HtmlTokenType> t) {
- return t.text.substring(isClose(t) ? 2 : 1);
- }
-
- /** Collapse adjacent text nodes. */
- private static List<DomTree> normalize(List<DomTree> nodes) {
- for (int i = 0; i < nodes.size(); ++i) {
- DomTree node = nodes.get(i);
- if (HtmlTokenType.TEXT == node.getType()) {
- int j = i + 1;
- for (int n = nodes.size(); j < n; ++j) {
- if (HtmlTokenType.TEXT != nodes.get(j).getType()) { break; }
- }
- if (j - i > 1) {
- Token<HtmlTokenType> firstToken = node.getToken(),
- lastToken = nodes.get(j - 1).getToken();
- StringBuilder newText = new StringBuilder(firstToken.text);
- for (int k = i + 1; k < j; ++k) {
- newText.append(nodes.get(k).getToken().text);
- }
- Token<HtmlTokenType> normalToken = Token.<HtmlTokenType>instance(
- newText.toString(), HtmlTokenType.TEXT,
- FilePosition.span(firstToken.pos, lastToken.pos));
- nodes.set(i, new DomTree.Text(normalToken));
- nodes.subList(i + 1, j).clear();
- i = j - 1;
- }
- }
- }
- return nodes;
}
private DomParser() {
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/DomParserMessageType.java
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/DomParserMessageType.java
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,62 @@
+// Copyright (C) 2007 Google Inc.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.google.caja.parser.html;
+
+import com.google.caja.reporting.MessageContext;
+import com.google.caja.reporting.MessageLevel;
+import com.google.caja.reporting.MessagePart;
+import com.google.caja.reporting.MessageType;
+import com.google.caja.reporting.MessageTypeInt;
+
+import java.io.IOException;
+
+/**
+ * Messages for the Dom Parser
+ *
+ * @author mikes...@gmail.com
+ */
+public enum DomParserMessageType implements MessageTypeInt {
+ UNMATCHED_END("%s: end tag %s does not match open tag %s",
+ MessageLevel.FATAL_ERROR),
+ MISPLACED_TAG("%s: tag %s should appear in %s", MessageLevel.WARNING),
+ MISSING_END("%s: element %s at %s is not closed",
+ MessageLevel.WARNING),
+ IGNORING_TOKEN("%s: ignoring token %s", MessageLevel.WARNING),
+ MOVING_TO_HEAD("%s: moving element %s to head", MessageLevel.LINT),
+ MISSING_DOCUMENT_ELEMENT("%s: no document element", MessageLevel.ERROR),
+ GENERIC_SAX_ERROR("%s: %s", MessageLevel.FATAL_ERROR),
+ ;
+
+ private final String formatString;
+ private final MessageLevel level;
+ private final int paramCount;
+
+ DomParserMessageType(String formatString, MessageLevel level) {
+ this.formatString = formatString;
+ this.level = level;
+ this.paramCount = MessageType.formatStringArity(formatString);
+ }
+
+ public int getParamCount() {
+ return paramCount;
+ }
+
+ public void format(MessagePart[] parts, MessageContext context,
+ Appendable out) throws IOException {
+ MessageType.formatMessage(formatString, parts, context, out);
+ }
+
+ public MessageLevel getLevel() { return level; }
+}
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/DomTree.java
==============================================================================
--- /trunk/src/java/com/google/caja/parser/html/DomTree.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/DomTree.java
Thu Dec 20 08:56:04 2007
@@ -16,6 +16,7 @@
import com.google.caja.lexer.CharProducer;
import com.google.caja.lexer.FilePosition;
+import com.google.caja.lexer.HtmlTextEscapingMode;
import com.google.caja.lexer.HtmlTokenType;
import com.google.caja.lexer.InputSource;
import com.google.caja.lexer.Token;
@@ -48,14 +49,18 @@
private final Token<HtmlTokenType> start;
private String value;
- DomTree(List<DomTree> children,
+ DomTree(List<? extends DomTree> children,
Token<HtmlTokenType> start, Token<HtmlTokenType> end) {
- this(children, start, FilePosition.span(start.pos, end.pos));
+ this(children, start,
+ start.pos != null ? FilePosition.span(start.pos, end.pos) : null);
}
- DomTree(List<DomTree> children, Token<HtmlTokenType> tok,
FilePosition pos) {
+ DomTree(List<? extends DomTree> children, Token<HtmlTokenType> tok,
+ FilePosition pos) {
this.start = tok;
- setFilePosition(pos);
+ if (pos != null) {
+ setFilePosition(pos);
+ }
createMutation().appendChildren(children).execute();
}
@@ -78,6 +83,10 @@
return value;
}
+ protected final void setValue(String value) {
+ this.value = value;
+ }
+
private String computeValue() {
switch (start.type) {
case TAGBEGIN:
@@ -87,8 +96,13 @@
case ATTRVALUE:
{
String s = start.text;
- if (s.length() >= 2 && ('"' == s.charAt(0) || '\'' ==
s.charAt(0))) {
- s = s.substring(1, s.length() - 1);
+ int n = s.length();
+ if (n >= 2) {
+ char ch0 = s.charAt(0);
+ if (s.charAt(n - 1) == ch0
+ && ('"' == ch0 || '\'' == ch0 || ch0 == '`')) {
+ s = s.substring(1, n - 1);
+ }
}
return xmlDecode(s);
}
@@ -146,9 +160,13 @@
* This can represent a snippet of markup.
*/
public static final class Fragment extends DomTree {
- public Fragment(List<DomTree> children) {
+ public Fragment() {
+ super(Collections.<DomTree>emptyList(), NULL_TOKEN, NULL_TOKEN);
+ }
+
+ public Fragment(List<? extends DomTree> children) {
super(children,
- new Token<HtmlTokenType>(
+ Token.instance(
"", HtmlTokenType.IGNORABLE,
FilePosition.startOf(children.get(0).getFilePosition())),
FilePosition.span(
@@ -162,6 +180,9 @@
}
}
}
+ private static final Token<HtmlTokenType> NULL_TOKEN
+ = Token.instance("", HtmlTokenType.IGNORABLE, null);
+
/**
* A DOM element. This node's value is its tag name.
@@ -169,12 +190,21 @@
* other tags, text nodes, CDATA sections.
*/
public static final class Tag extends DomTree {
- public Tag(List<DomTree> children,
- Token<HtmlTokenType> start, Token<HtmlTokenType> end) {
+ public Tag(List<? extends DomTree> children,
+ Token<HtmlTokenType> start, Token<HtmlTokenType> end) {
super(children, start, end);
assert start.type == HtmlTokenType.TAGBEGIN;
+ assert end.type == HtmlTokenType.TAGEND;
}
+ public Tag(List<? extends DomTree> children,
+ Token<HtmlTokenType> start, FilePosition pos) {
+ super(children, start, pos);
+ assert start.type == HtmlTokenType.TAGBEGIN;
+ }
+
+ void setTagName(String canonicalTagName) {
setValue(canonicalTagName); }
+
public String getTagName() { return getValue(); }
public void render(RenderContext r) throws IOException {
@@ -190,13 +220,24 @@
child.render(r);
++i;
}
- r.out.append('>');
- while (i < n) {
- children.get(i++).render(r);
+ if (i == n && HtmlTextEscapingMode.isVoidElement(getTagName())) {
+ // This is safe regardless of whether the output is XML or
HTML since
+ // we only skip the end tag for HTML elements that don't
require one,
+ // and the slash will cause XML to treat it as a void tag.
+ r.out.append(" />");
+ } else {
+ r.out.append('>');
+ while (i < n) {
+ children.get(i++).render(r);
+ }
+ // This is not correct for HTML <plaintext> nodes, but live
with it,
+ // since handling plaintext correctly would require omitting
end tags
+ // for parent nodes, and so significantly complicate rendering
for a
+ // node we shouldn't ever render anyway.
+ r.out.append("</");
+ renderHtmlIdentifier(getTagName(), r);
+ r.out.append('>');
}
- r.out.append("</");
- renderHtmlIdentifier(getTagName(), r);
- r.out.append('>');
}
}
@@ -210,16 +251,30 @@
super(Collections.<DomTree>singletonList(value), start, end);
assert start.type == HtmlTokenType.ATTRNAME;
}
+ Attrib(Value value, Token<HtmlTokenType> start, FilePosition pos) {
+ super(Collections.<DomTree>singletonList(value), start, pos);
+ assert start.type == HtmlTokenType.ATTRNAME;
+ }
public String getAttribName() { return getValue(); }
public String getAttribValue() { return
children().get(0).getValue(); }
public Value getAttribValueNode() { return (Value)
children().get(0); }
+ void setAttribName(String canonicalName) {
setValue(canonicalName); }
+
public void render(RenderContext r) throws IOException {
renderHtmlIdentifier(getAttribName(), r);
r.out.append("=\"");
getAttribValueNode().render(r);
r.out.append('"');
}
+
+ @Override
+ public Attrib clone() {
+ Attrib clone = new Attrib(
+ getAttribValueNode().clone(), getToken(), getFilePosition());
+ clone.setAttribName(getAttribName());
+ return clone;
+ }
}
/**
@@ -234,6 +289,11 @@
public void render(RenderContext r) throws IOException {
renderHtmlAttributeValue(getValue(), r);
}
+
+ @Override
+ public Value clone() {
+ return new Value(getToken());
+ }
}
/**
@@ -247,7 +307,12 @@
}
public void render(RenderContext r) throws IOException {
- renderHtml(getValue(), r);
+ if (getToken().type == HtmlTokenType.UNESCAPED) {
+ // TODO(mikesamuel): disallow this if the rendercontext
specifies XML
+ r.out.append(getValue());
+ } else {
+ renderHtml(getValue(), r);
+ }
}
}
@@ -274,12 +339,12 @@
private static void renderHtmlIdentifier(String text, RenderContext r)
throws IOException {
- Escaping.escapeXml(text, true, r.out);
+ Escaping.escapeXml(text, true, r.out);
}
private static void renderHtmlAttributeValue(String text,
RenderContext r)
throws IOException {
- Escaping.escapeXml(text, true, r.out);
+ Escaping.escapeXml(text, true, r.out);
}
private static void renderHtml(String text, RenderContext r)
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/Html5ElementStack.java
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/Html5ElementStack.java
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,722 @@
+// Copyright (C) 2007 Google Inc.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.google.caja.parser.html;
+
+import com.google.caja.lexer.FilePosition;
+import com.google.caja.lexer.HtmlTokenType;
+import com.google.caja.lexer.Token;
+import com.google.caja.lexer.escaping.Escaping;
+import com.google.caja.parser.MutableParseTreeNode;
+import com.google.caja.reporting.Message;
+import com.google.caja.reporting.MessageLevel;
+import com.google.caja.reporting.MessagePart;
+import com.google.caja.reporting.MessageQueue;
+import com.google.caja.util.SyntheticAttributeKey;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+
+import org.xml.sax.Attributes;
+import org.xml.sax.ErrorHandler;
+import org.xml.sax.SAXException;
+import org.xml.sax.SAXParseException;
+
+import nu.validator.htmlparser.common.XmlViolationPolicy;
+import nu.validator.htmlparser.common.DoctypeExpectation;
+import nu.validator.htmlparser.impl.Tokenizer;
+import nu.validator.htmlparser.impl.TreeBuilder;
+
+/**
+ * A bridge between DomParser and html5lib which translates
+ * {@code Token<HtmlTokenType>}s into SAX style events which are fed
to the
+ * TreeBuilder. The TreeBuilder responds by issuing {@code createElement}
+ * commands which are used to build a {@link DomTree}.
+ *
+ * @author mikes...@gmail.com
+ */
+public class Html5ElementStack implements OpenElementStack {
+ private final CajaTreeBuilder builder = new CajaTreeBuilder();
+ private final char[] charBuf = new char[1024];
+ private final MessageQueue mq;
+ private boolean isFragment;
+
+ /** @param queue will receive error messages from html5lib. */
+ Html5ElementStack(MessageQueue queue) {
+ this.mq = queue;
+ }
+
+ /** @inheritDoc */
+ public void open(boolean isFragment) {
+ this.isFragment = isFragment;
+ builder.setDoctypeExpectation(DoctypeExpectation.NO_DOCTYPE_ERRORS);
+ try {
+ builder.start(new Tokenizer(builder));
+ } catch (SAXException ex) {
+ throw new RuntimeException(ex);
+ }
+ builder.setErrorHandler(
+ new ErrorHandler() {
+ private FilePosition lastPos;
+ private String lastMessage;
+
+ public void error(SAXParseException ex) {
+ // htmlparser is a bit strident, so we lower it's warnings to
+ // MessageLevel.LINT.
+ report(MessageLevel.LINT, ex);
+ }
+ public void fatalError(SAXParseException ex) {
+ report(MessageLevel.FATAL_ERROR, ex);
+ }
+ public void warning(SAXParseException ex) {
+ report(MessageLevel.LINT, ex);
+ }
+
+ private void report(MessageLevel level, SAXParseException
ex) {
+ String message = errorMessage(ex);
+ FilePosition pos = builder.getErrorLocation();
+ if (message.equals(lastMessage) && pos.equals(lastPos)) {
return; }
+ lastMessage = message;
+ lastPos = pos;
+ mq.getMessages().add(new Message(
+ DomParserMessageType.GENERIC_SAX_ERROR, level, pos,
+ MessagePart.Factory.valueOf(message)));
+ }
+
+ private String errorMessage(SAXParseException ex) {
+ // Don't ask.
+ return ex.getMessage()
+ .replace('\u201c', '\'').replace('\u201d', '\'');
+ }
+ });
+ }
+
+ /** @inheritDoc */
+ public void finish(FilePosition endOfFile) {
+ builder.finish(endOfFile);
+ builder.closeUnclosedNodes();
+ }
+
+ /** @inheritDoc */
+ public String canonicalizeElementName(String elementName) {
+ return canonicalElementName(elementName);
+ }
+
+ /** @inheritDoc */
+ public String canonicalizeAttributeName(String attributeName) {
+ return canonicalAttributeName(attributeName);
+ }
+
+ public static String canonicalElementName(String elementName) {
+ // Locale.ENGLISH forces LANG=C like behavior.
+ return elementName.toLowerCase(Locale.ENGLISH);
+ }
+
+ public static String canonicalAttributeName(String attributeName) {
+ // Locale.ENGLISH forces LANG=C like behavior.
+ return attributeName.toLowerCase(Locale.ENGLISH);
+ }
+
+ /** @inheritDoc */
+ public DomTree.Fragment getRootElement() {
+ // libHtmlParser always produces a document with html, head, and
body tags
+ // which we usually don't want, so unroll it.
+ DomTree.Tag root = builder.getRootElement();
+ DomTree.Fragment result = new DomTree.Fragment();
+ result.setFilePosition(builder.getFragmentBounds());
+ if (!isFragment) {
+ result.appendChild(root);
+ return result;
+ }
+
+ final List<? extends DomTree> children = root.children();
+
+ // If disposing of the html, body, or head elements would lose
info don't
+ // do it, so look for attributes.
+ boolean tagsBesidesHeadAndBody = false;
+ boolean topLevelTagsWithAttributes = false;
+
+ for (DomTree child : children) {
+ if (child instanceof DomTree.Attrib) {
+ topLevelTagsWithAttributes = true;
+ break;
+ } else if (child instanceof DomTree.Tag) {
+ DomTree.Tag el = (DomTree.Tag) child;
+ if (!("head".equals(el.getTagName())
+ || "body".equals(el.getTagName()))) {
+ tagsBesidesHeadAndBody = true;
+ break;
+ }
+ if (!el.children().isEmpty()
+ && el.children().get(0) instanceof DomTree.Attrib) {
+ topLevelTagsWithAttributes = true;
+ break;
+ }
+ }
+ }
+
+ if (tagsBesidesHeadAndBody || topLevelTagsWithAttributes) {
+ // Merging the body and head would lose info.
+ result.appendChild(root);
+ return result;
+ }
+
+ // Merge the body and head into a fragment
+ MutableParseTreeNode.Mutation mutation = result.createMutation();
+ DomTree pending = null;
+ for (DomTree child : children) {
+ if (child instanceof DomTree.Tag) {
+ // Shallow descent
+ for (DomTree grandchild : child.children()) {
+ pending = appendNormalized(pending, grandchild, mutation);
+ }
+ } else {
+ pending = appendNormalized(pending, child, mutation);
+ }
+ }
+ if (pending != null) { mutation.appendChild(pending); }
+
+ mutation.execute();
+ return result;
+ }
+ /**
+ * Given one or two nodes, see if the two can be combined.
+ * If two are passed in, they might be combined into one and
returned, or
+ * the first will be appended via mut, and the other returned.
+ */
+ private DomTree appendNormalized(
+ DomTree pending, DomTree current, MutableParseTreeNode.Mutation
mut) {
+ if (pending == null) { return current; }
+ Token<HtmlTokenType> pendingToken = pending.getToken();
+ Token<HtmlTokenType> currentToken = current.getToken();
+ if (!(HtmlTokenType.TEXT == pendingToken.type
+ && HtmlTokenType.TEXT == currentToken.type)) {
+ mut.appendChild(pending);
+ return current;
+ }
+ return new DomTree.Text(
+ Token.instance(
+ pendingToken.text + currentToken.text, HtmlTokenType.TEXT,
+ FilePosition.span(pendingToken.pos, currentToken.pos)));
+ }
+
+ /**
+ * Records the fact that a tag has been seen, updating internal state
+ *
+ * @param start the token of the beginning of the tag, so
{@code "<p"} for a
+ * paragraph start, {@code </p} for an end tag.
+ * @param end the token of the beginning of the tag, so {@code ">"}
for a
+ * paragraph start, {@code />} for an unary break tag.
+ * @param attrs the attributes for the element. This will be empty
+ * for end tags.
+ */
+ public void processTag(Token<HtmlTokenType> start,
Token<HtmlTokenType> end,
+ List<DomTree.Attrib> attrs) {
+ builder.setTokenContext(start, end);
+ AttributesImpl attrsWrapped = new AttributesImpl(attrs);
+ try {
+ String tagName = CajaTreeBuilder.tagName(start.text);
+ if (CajaTreeBuilder.isEndTag(start.text)) {
+ builder.endTag(tagName, attrsWrapped);
+ } else {
+ builder.startTag(tagName, attrsWrapped);
+ }
+ } catch (SAXException ex) {
+ throw new RuntimeException(ex);
+ }
+ }
+
+ /**
+ * Adds the given text node to the DOM.
+ */
+ public void processText(Token<HtmlTokenType> textToken) {
+ builder.setTokenContext(textToken, textToken);
+ String text = textToken.text;
+ char[] chars;
+ int n = text.length();
+ if (n <= charBuf.length) {
+ chars = charBuf;
+ text.getChars(0, n, chars, 0);
+ } else {
+ chars = text.toCharArray();
+ }
+ try {
+ builder.characters(chars, 0, n);
+ } catch (SAXException ex) {
+ throw new RuntimeException(ex);
+ }
+ }
+}
+
+/**
+ * Bridges between html5lib's TreeBuilder which actually builds the
DOM, and
+ * HtmlLexer which produces tokens. This does a bit of accounting to
make sure
+ * that file positions are preserved on all DOM, text, and attribute nodes.
+ *
+ * @author mikes...@gmail.com
+ */
+final class CajaTreeBuilder extends TreeBuilder<DomTree> {
+ /** Maintain parent chains while the DOM is being built. */
+ private static final SyntheticAttributeKey<DomTree> PARENT
+ = new SyntheticAttributeKey<DomTree>(DomTree.class, "parent");
+ private static final boolean DEBUG = false;
+
+ // Keep track of the tokens bounding the section we're processing so that
+ // we can compute file positions for all added nodes.
+ private Token<HtmlTokenType> startTok;
+ private Token<HtmlTokenType> endTok;
+ // The root html element. TreeBuilder always creates a valid tree with
+ // html, head, and body elements.
+ private DomTree.Tag rootElement;
+ // Used to compute the spanning file position on the overall
document. Since
+ // nodes can move around we can't easily compute this without
looking at all
+ // descendants.
+ private FilePosition fragmentBounds;
+
+ CajaTreeBuilder() {
+ super(
+ // Allow loose parsing
+ XmlViolationPolicy.ALLOW,
+ // Don't coalesce text so that we can apply file positions.
+ false);
+ setIgnoringComments(true);
+ setScriptingEnabled(true); // Affects behavior of noscript
+ }
+
+ DomTree.Tag getRootElement() {
+ return rootElement;
+ }
+
+ FilePosition getFragmentBounds() {
+ return fragmentBounds;
+ }
+
+ FilePosition getErrorLocation() {
+ return (startTok.pos != endTok.pos
+ ? FilePosition.span(startTok.pos, endTok.pos)
+ : startTok.pos);
+ }
+
+ void setTokenContext(Token<HtmlTokenType> start,
Token<HtmlTokenType> end) {
+ if (DEBUG) {
+ System.err.println(
+ "*** considering " + start.toString().replace("\n", "\\n"));
+ }
+ startTok = start;
+ endTok = end;
+ if (fragmentBounds == null) { fragmentBounds = start.pos; }
+ }
+
+ void finish(FilePosition pos) {
+ Token<HtmlTokenType> eofToken = Token.instance(
+ "", HtmlTokenType.IGNORABLE, pos);
+ setTokenContext(eofToken, eofToken);
+ fragmentBounds = FilePosition.span(fragmentBounds, pos);
+ try {
+ eof(); // Signal that we can close the html node now.
+ } catch (SAXException ex) {
+ throw new RuntimeException(ex);
+ }
+ }
+
+ @Override
+ protected void appendCommentToDocument(char[] buf, int start, int
length) {}
+
+ @Override
+ protected void appendComment(DomTree el, char[] buf, int start, int
length) {}
+
+ @Override
+ protected void appendCharacters(
+ DomTree el, char[] buf, int start, int length) {
+ insertCharactersBefore(buf, start, length, null, el);
+ }
+
+ @Override
+ protected void insertCharactersBefore(
+ char[] buf, int start, int length, DomTree sibling, DomTree
parent) {
+ // Normalize text by adding to an existing text node.
+ List<? extends DomTree> siblings = parent.children();
+
+ int siblingIndex = siblings.indexOf(sibling);
+ if (siblingIndex < 0) {
+ siblingIndex = siblings.size();
+ }
+ if (siblingIndex > 0) {
+ DomTree priorSibling = siblings.get(siblingIndex - 1);
+ if (priorSibling instanceof DomTree.Text
+ && priorSibling.getToken().type == HtmlTokenType.TEXT) {
+ // Normalize the DOM by collapsing adjacent text nodes.
+ Token<HtmlTokenType> previous = priorSibling.getToken();
+ StringBuilder sb = new StringBuilder(previous.text);
+ sb.append(buf, start, length);
+ Token<HtmlTokenType> combined = Token.instance(
+ sb.toString(), previous.type,
+ FilePosition.span(previous.pos, endTok.pos));
+ parent.replaceChild(p(new DomTree.Text(combined), parent),
+ up(priorSibling));
+ return;
+ }
+ }
+
+ Token<HtmlTokenType> tok = this.startTok;
+ if (!bufferMatches(buf, start, length, tok.text)) {
+ tok = Token.instance(String.valueOf(buf, start, length),
+ HtmlTokenType.TEXT, endTok.pos);
+ }
+ insertBefore(new DomTree.Text(tok), null, parent);
+ }
+
+ @Override
+ protected void addAttributesToElement(DomTree el, Attributes
attributes) {
+ int n = attributes.getLength();
+ if (n == 0) { return; }
+
+ // This method is used to merge multiple body elements together. Since
+ // it's illegal to have multiple attributes with the same name, we need
+ // to filter. The spec says that firstcomers win.
+ Set<String> have = new HashSet<String>();
+ DomTree nodeAfterLastAttrib = null;
+ for (DomTree child : el.children()) {
+ if (!(child instanceof DomTree.Attrib)) {
+ nodeAfterLastAttrib = child;
+ break;
+ }
+ DomTree.Attrib attr = (DomTree.Attrib) child;
+ have.add(Html5ElementStack.canonicalAttributeName(attr.getAttribName()));
+ }
+
+ MutableParseTreeNode.Mutation mut = el.createMutation();
+ for (DomTree.Attrib attr : getAttributes(attributes)) {
+ if (have.add(
+
Html5ElementStack.canonicalAttributeName(attr.getAttribName()))) {
+ mut.insertBefore(attr, nodeAfterLastAttrib);
+ }
+ }
+ mut.execute();
+ }
+
+ @Override
+ protected void insertBefore(DomTree child, DomTree sibling, DomTree
parent) {
+ parent.insertBefore(p(child, parent), sibling);
+ }
+
+ @Override
+ protected DomTree parentElementFor(DomTree child) {
+ return child.getAttributes().get(PARENT);
+ }
+
+ @Override
+ protected void appendChildrenToNewParent(
+ DomTree oldParent, DomTree newParent) {
+ if (DEBUG) {
+ System.err.println(
+ "Appending children of " + oldParent + " to " + newParent);
+ }
+ List<? extends DomTree> children = oldParent.children();
+ if (oldParent == newParent || oldParent.children().isEmpty()) {
return; }
+ MutableParseTreeNode.Mutation oldMut = oldParent.createMutation();
+ MutableParseTreeNode.Mutation newMut = newParent.createMutation();
+ for (DomTree child : children) {
+ // Attributes not considered children by DOM2
+ if (child instanceof DomTree.Attrib) { continue; }
+ oldMut.removeChild(child);
+ newMut.appendChild(p(child, newParent));
+ }
+ oldMut.execute();
+ newMut.execute();
+ }
+
+ @Override
+ protected void detachFromParentAndAppendToNewParent(
+ DomTree child, DomTree newParent) {
+ DomTree oldParent = parentElementFor(child);
+ if (DEBUG) {
+ System.err.println("detach " + child + " and append to " + newParent
+ + ", oldParent=" + oldParent);
+ }
+ if (oldParent != null) {
+ oldParent.removeChild(child);
+ }
+ newParent.appendChild(p(child, newParent));
+ }
+
+ @Override
+ protected DomTree shallowClone(DomTree el) {
+ if (DEBUG) { System.err.println("cloning " + el); }
+ if (el instanceof DomTree.Tag) {
+ List<DomTree.Attrib> attribs = new ArrayList<DomTree.Attrib>();
+ // Shallow clone includes attributes since they're not really children.
+ for (DomTree child : el.children()) {
+ if (!(child instanceof DomTree.Attrib)) { break; }
+ attribs.add((DomTree.Attrib) child.clone());
+ }
+ return new DomTree.Tag(attribs, el.getToken(), el.getFilePosition());
+ } else if (el instanceof DomTree.Fragment) {
+ DomTree.Fragment clone = new DomTree.Fragment();
+ clone.setFilePosition(el.getFilePosition());
+ return clone;
+ } else {
+ throw new IllegalArgumentException();
+ }
+ }
+
+ @Override
+ protected boolean hasChildren(DomTree element) {
+ List<? extends DomTree> children = element.children();
+ // If the last child is an attribute then we don't have any non-attribute
+ // children.
+ return !(children.isEmpty()
+ || children.get(children.size() - 1) instanceof DomTree.Attrib);
+ }
+
+ @Override
+ protected void detachFromParent(DomTree element) {
+ if (DEBUG) { System.err.println("detach " + element + " from
parent"); }
+ parentElementFor(element).removeChild(element);
+ up(element);
+ }
+
+ @Override
+ protected DomTree createHtmlElementSetAsRoot(Attributes attributes) {
+ DomTree.Tag documentElement = createElement("html", attributes);
+ if (DEBUG) { System.err.println("Created root " +
documentElement); }
+ this.rootElement = documentElement;
+ return documentElement;
+ }
+
+ @Override
+ protected DomTree.Tag createElement(String name, Attributes
attributes) {
+ if (DEBUG) { System.err.println("Created element " + name); }
+ Token<HtmlTokenType> elStartTok;
+ FilePosition pos;
+ name = Html5ElementStack.canonicalElementName(name);
+ if (startTok == null) {
+ elStartTok = Token.instance("<" + name, HtmlTokenType.TAGBEGIN, null);
+ pos = null;
+ } else if (startTok.type == HtmlTokenType.TAGBEGIN
+ && tagMatchesElementName(tagName(startTok.text), name)) {
+ elStartTok = startTok;
+ pos = FilePosition.span(startTok.pos, endTok.pos);
+ } else {
+ pos = FilePosition.startOf(startTok.pos);
+ elStartTok = Token.instance("<" + name, HtmlTokenType.TAGBEGIN, pos);
+ }
+ DomTree.Tag el
+ = new DomTree.Tag(getAttributes(attributes), elStartTok, pos);
+ el.setTagName(name);
+ return el;
+ }
+
+ // Track unpopped elements since a </html> tag will not close tables and
+ // other scoping elements.
+ Set<DomTree> unpoppedElements = new HashSet<DomTree>();
+
+ @Override
+ protected void elementPopped(String name, DomTree node) {
+ name = Html5ElementStack.canonicalElementName(name);
+ if (DEBUG) { System.err.println("popped " + name + ", node=" +
node); }
+ FilePosition endPos;
+ if (startTok.type == HtmlTokenType.TAGBEGIN
+ && (isEndTag(startTok.text) || "select".equals(name))
+ && tagCloses(tagName(startTok.text), name)) {
+ endPos = endTok.pos;
+ } else {
+ // Implied ending.
+ endPos = FilePosition.startOf(startTok.pos);
+ }
+ FilePosition startPos = node.getFilePosition();
+ if (startPos == null) {
+ startPos = node.children().isEmpty()
+ ? endPos : node.children().get(0).getFilePosition();
+ }
+ if (endPos.endCharInFile() >= startPos.endCharInFile()) {
+ node.setFilePosition(FilePosition.span(startPos, endPos));
+ }
+ unpoppedElements.remove(node);
+ }
+
+ // htmlparser does not generate elementPopped events for the html or body
+ // elements, or for void elements.
+ @Override
+ protected void bodyClosed(DomTree body) {
+ elementPopped("body", body);
+ }
+
+ @Override
+ protected void htmlClosed(DomTree html) {
+ elementPopped("html", html);
+ }
+
+ @Override
+ protected void elementPushed(String name, DomTree node) {
+ if (DEBUG) { System.err.println("pushed " + name + ", node=" +
node); }
+ unpoppedElements.add(node);
+ }
+
+ // Make sure that the end file position is correct for elements
still open
+ // when EOF is reached.
+ void closeUnclosedNodes() {
+ for (DomTree node : unpoppedElements) {
+ node.setFilePosition(
+ FilePosition.span(node.getFilePosition(), endTok.pos));
+ }
+ unpoppedElements.clear();
+ }
+
+ private static boolean bufferMatches(
+ char[] buf, int start, int len, String s) {
+ if (len != s.length()) { return false; }
+ for (int i = len; --i >= 0;) {
+ if (s.charAt(i) != buf[start + i]) { return false; }
+ }
+ return true;
+ }
+
+ // Set up the parent chain so we can simulate org.w3c.dom.Node.getParentNode.
+ private static <T extends DomTree> T p(T el, DomTree parent) {
+ el.getAttributes().set(PARENT, parent);
+ return el;
+ }
+
+ private static <T extends DomTree> T up(T el) {
+ el.getAttributes().set(PARENT, null);
+ return el;
+ }
+
+ // htmlparser passes around an org.xml.sax Attributes list which is a
+ // String->String map, but I want to use DomTree.Attrib nodes since they
+ // have position info. htmlparser in some cases does create its own
+ // Attributes instances, such as when it is expanding a tag to emulate
+ // deprecated elements.
+ private List<DomTree.Attrib> getAttributes(Attributes attributes) {
+ if (attributes instanceof AttributesImpl) {
+ return ((AttributesImpl) attributes).getAttributes();
+ }
+ // There might be attributes here, but only for emulated tags,
such as the
+ // mess that is "isindex"
+ int n = attributes.getLength();
+ if (n == 0) {
+ return Collections.<DomTree.Attrib>emptyList();
+ } else {
+ List<DomTree.Attrib> fakeAttribs = new ArrayList<DomTree.Attrib>();
+ FilePosition pos = FilePosition.startOf(startTok.pos);
+ for (int i = 0; i < n; ++i) {
+ String name = attributes.getLocalName(i);
+ StringBuilder sb = new StringBuilder();
+ sb.append('"');
+ Escaping.escapeXml(attributes.getValue(i), false, sb);
+ sb.append('"');
+ String encodedValue = sb.toString();
+ fakeAttribs.add(
+ new DomTree.Attrib(
+ new DomTree.Value(
+ Token.instance(encodedValue,
HtmlTokenType.ATTRVALUE, pos)),
+ Token.instance(name, HtmlTokenType.ATTRNAME, pos), pos));
+ }
+ return fakeAttribs;
+ }
+ }
+
+ // the start token text is either <name or </name for a tag
+ static boolean isEndTag(String tokenText) {
+ return tokenText.length() >= 2 && tokenText.charAt(1) == '/';
+ }
+
+ static String tagName(String tokenText) {
+ String name = tokenText.substring(isEndTag(tokenText) ? 2 : 1);
+ // Intern since the TreeBuilder likes to compare strings by reference.
+ return Html5ElementStack.canonicalElementName(name).intern();
+ }
+
+ static boolean tagMatchesElementName(String tagName, String
elementName) {
+ return tagName.equals(elementName)
+ || (tagName.equals("image") && elementName.equals("img"));
+ }
+
+ /**
+ * true if a close tag with the given name closes an element with the
+ * given name.
+ */
+ static boolean tagCloses(String tagName, String elementName) {
+ return tagMatchesElementName(tagName, elementName)
+ || (isHeading(tagName) && isHeading(elementName));
+ }
+
+ /** true for h1, h2, ... */
+ static boolean isHeading(String tagName) {
+ if (tagName.length() != 2 || 'h' != tagName.charAt(0)) { return
false; }
+ char ch1 = tagName.charAt(1);
+ return ch1 >= '1' && ch1 <= '6';
+ }
+}
+
+/**
+ * An implementation of org.xml.sax that wraps {@code DomTree.Attrib}s.
+ * This ignores all namespacing since HTML doesn't do namespacing.
+ */
+final class AttributesImpl implements Attributes {
+ private final List<DomTree.Attrib> attribs;
+
+ AttributesImpl(List<DomTree.Attrib> attribs) { this.attribs =
attribs; }
+
+ public int getIndex(String qName) {
+ int index = 0;
+ for (DomTree.Attrib attrib : attribs) {
+ if (attrib.getAttribName().equals(qName)) { return index; }
+ ++index;
+ }
+ return -1;
+ }
+
+ public int getIndex(String uri, String localName) {
+ return getIndex(localName);
+ }
+
+ public int getLength() { return attribs.size(); }
+
+ public String getLocalName(int index) {
+ return attribs.get(index).getAttribName();
+ }
+
+ public String getQName(int index) { return getLocalName(index); }
+
+ public String getType(int index) { return null; }
+
+ public String getType(String qName) { return null; }
+
+ public String getType(String uri, String localName) { return null; }
+
+ public String getURI(int index) { return null; }
+
+ public String getValue(int index) {
+ return attribs.get(index).getAttribValue();
+ }
+
+ public String getValue(String qName) {
+ int index = getIndex(qName);
+ return index < 0 ? null : getValue(index);
+ }
+
+ public String getValue(String uri, String localName) {
+ return getValue(localName);
+ }
+
+ public List<DomTree.Attrib> getAttributes() {
+ return Collections.unmodifiableList(attribs);
+ }
+}
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/IllegalDocumentStateException.java
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/IllegalDocumentStateException.java
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,34 @@
+// Copyright (C) 2007 Google Inc.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.google.caja.parser.html;
+
+import com.google.caja.CajaException;
+import com.google.caja.reporting.Message;
+
+/**
+ * Thrown when an HTML token cannot be incorporated into a document
that is
+ * in the process of being built.
+ *
+ * @author mikes...@gmail.com
+ */
+public class IllegalDocumentStateException extends CajaException {
+ public IllegalDocumentStateException(Message msg, Throwable cause) {
+ super(msg, cause);
+ }
+
+ public IllegalDocumentStateException(Message msg) {
+ super(msg);
+ }
+}
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/OpenElementStack.java
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/OpenElementStack.java
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,126 @@
+// Copyright (C) 2007 Google Inc.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.google.caja.parser.html;
+
+import com.google.caja.lexer.FilePosition;
+import com.google.caja.lexer.HtmlTokenType;
+import com.google.caja.lexer.Token;
+import com.google.caja.reporting.MessageQueue;
+import java.util.List;
+
+/**
+ * Consumes SAX style events (tag name and attributes) from the
+ * {@link DomParser} to build a DomTree.
+ *
+ * <p>
+ * Instances of this class are not reusable over multiple parses.
+ *
+ * <p>
+ * The {@link OpenElementStack.Factory Factory} class has
implementations of
+ * this interface for both
+ * {@link
OpenElementStack.Factory#createHtml5ElementStack(MessageQueue) HTML}
+ * and a trivial one for all
+ * {@link OpenElementStack.Factory#createXmlElementStack() XML}
including XHTML.
+ *
+ * @see <a href="http://www.whatwg.org/specs/web-apps/current-work/">HTML5</a>
+ * @see <a href="http://www.w3.org/TR/REC-xml/">XML</a>
+ * @see <a href="http://james.html5.org/parsetree.html">HTML5 Validator</a>
+ * @see <a href="http://html5lib.googlecode.com/svn/trunk/testdata/">Tests</a>
+ * @see <a href="http://wiki.whatwg.org/wiki/Parser_tests">More Tests</a>
+ *
+ * @author mikes...@gmail.com
+ */
+public interface OpenElementStack {
+
+ /**
+ * The root element.
+ */
+ DomTree.Fragment getRootElement();
+
+ /**
+ * Given an element name, return a canonical element name.
+ * <p>
+ * This API does not currently handle namespace-aware XML.
+ * <p>
+ * Since this method canonicalizes, it is idempotent. It must be idempotent
+ * even if the input is not canonicalizable to a name in an HTML or XML
+ * schema.
+ */
+ String canonicalizeElementName(String elementName);
+
+ /**
+ * Given an element name, return a canonical attribute name.
+ * <p>
+ * This API does not currently handle namespace-aware XML.
+ * <p>
+ * Since this method canonicalizes, it is idempotent. It must be idempotent
+ * even if the input is not canonicalizable to a name in an HTML or XML
+ * schema.
+ */
+ String canonicalizeAttributeName(String attributeName);
+
+ /**
+ * Records the fact that a tag has been seen, updating internal state
+ *
+ * @param start the token of the beginning of the tag, so
{@code "<p"} for a
+ * paragraph start, {@code </p} for an end tag.
+ * @param end the token of the beginning of the tag, so {@code ">"}
for a
+ * paragraph start, {@code />} for an unary break tag.
+ * @param attrs the attributes for the element. This will be empty
+ * for end tags.
+ */
+ void processTag(Token<HtmlTokenType> start, Token<HtmlTokenType> end,
+ List<DomTree.Attrib> attrs)
+ throws IllegalDocumentStateException;
+
+ /**
+ * Adds the given text node to the DOM.
+ */
+ void processText(Token<HtmlTokenType> text);
+
+ /**
+ * Called before parsing starts.
+ *
+ * @param isFragment true to parse a fragment, not a full html document.
+ */
+ void open(boolean isFragment);
+
+ /**
+ * Check that the document is in a consistent state, by checking
that all
+ * elements that need to be closed, have been properly closed.
+ *
+ * This method may modify the DOM, e.g. by removing ignorable text
nodes from
+ * the root to ensure a single document element.
+ *
+ * @param endOfFile position at which parsing ends.
+ */
+ void finish(FilePosition endOfFile)
+ throws IllegalDocumentStateException;
+
+ /**
+ * Constructors.
+ */
+ public static final class Factory {
+ public static OpenElementStack
createHtml5ElementStack(MessageQueue mq) {
+ return new Html5ElementStack(mq);
+ }
+
+ public static OpenElementStack createXmlElementStack() {
+ return new XmlElementStack();
+ }
+
+ private Factory() {}
+ }
+}
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/XmlElementStack.java
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/parser/html/XmlElementStack.java
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,132 @@
+// Copyright (C) 2007 Google Inc.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.google.caja.parser.html;
+
+import com.google.caja.lexer.FilePosition;
+import com.google.caja.lexer.HtmlTokenType;
+import com.google.caja.lexer.Token;
+import com.google.caja.reporting.Message;
+import com.google.caja.reporting.MessagePart;
+import java.util.List;
+
+/**
+ * An element stack implementation for XML.
+ *
+ * @author mikes...@gmail.com
+ */
+class XmlElementStack extends AbstractElementStack {
+ XmlElementStack() {}
+
+ public String canonicalizeElementName(String elementName) {
+ // This will need to change if we start accepting namespaced XML.
+ return elementName;
+ }
+
+ public String canonicalizeAttributeName(String attributeName) {
+ // This will need to change if we start accepting namespaced XML.
+ return attributeName;
+ }
+
+ /** @inheritDoc */
+ public void processTag(Token<HtmlTokenType> start,
Token<HtmlTokenType> end,
+ List<DomTree.Attrib> attrs)
+ throws IllegalDocumentStateException {
+ assert start.type == HtmlTokenType.TAGBEGIN;
+ assert end.type == HtmlTokenType.TAGEND;
+
+ boolean open = !start.text.startsWith("</");
+ String tagName
+ = canonicalizeElementName(start.text.substring(open ? 1 : 2));
+
+ processTag(tagName, open, start, end, attrs);
+ }
+
+ private void processTag(
+ String tagName, boolean open, Token<HtmlTokenType> start,
+ Token<HtmlTokenType> end, List<DomTree.Attrib> attrs)
+ throws IllegalDocumentStateException {
+ if (open) {
+ DomTree.Tag newElement = new DomTree.Tag(attrs, start, end);
+ push(newElement, tagName);
+
+ // Does the tag end immediately?
+ if ("/>".equals(end.text)) { popN(1, end.pos); }
+ } else {
+ String bottomElementName = getBottomElement().getValue();
+ if (!tagName.equals(bottomElementName)) {
+ throw new IllegalDocumentStateException(new Message(
+ DomParserMessageType.UNMATCHED_END,
+ start.pos, MessagePart.Factory.valueOf(start.text),
+ MessagePart.Factory.valueOf(bottomElementName)));
+ }
+ popN(1, end.pos);
+ }
+ }
+
+ /** @inheritDoc */
+ public void processText(Token<HtmlTokenType> text) {
+ DomTree parent = getBottomElement();
+
+ DomTree textNode;
+ switch (text.type) {
+ case CDATA:
+ textNode = new DomTree.CData(text);
+ break;
+ case TEXT:
+ {
+ List<? extends DomTree> siblings = parent.children();
+ if (!siblings.isEmpty()) {
+ DomTree lastSibling = siblings.get(siblings.size() - 1);
+ if (lastSibling instanceof DomTree.Text
+ && lastSibling.getToken().type == HtmlTokenType.TEXT) {
+ // Normalize the DOM by collapsing adjacent text nodes.
+ Token<HtmlTokenType> previous = lastSibling.getToken();
+ Token<HtmlTokenType> combined = Token.instance(
+ previous.text + text.text, previous.type,
+ FilePosition.span(previous.pos, text.pos));
+ parent.replaceChild(new DomTree.Text(combined), lastSibling);
+ return;
+ }
+ }
+ }
+ textNode = new DomTree.Text(text);
+ break;
+ case UNESCAPED:
+ textNode = new DomTree.Text(text);
+ break;
+ default:
+ throw new IllegalArgumentException(text.toString());
+ }
+ doAppend(textNode, parent);
+ }
+
+ /** @inheritDoc */
+ public void finish(FilePosition endOfDocument)
+ throws IllegalDocumentStateException {
+ stripIgnorableText();
+ DomTree root = getRootElement();
+ root.setFilePosition(
+ FilePosition.span(root.getFilePosition(), endOfDocument));
+
+ int nOpen = getNOpenElements();
+ if (nOpen != 1) {
+ DomTree.Tag openEl = getElement(nOpen - 1);
+ throw new IllegalDocumentStateException(new Message(
+ DomParserMessageType.MISSING_END, endOfDocument,
+ MessagePart.Factory.valueOf(openEl.getTagName()),
+ openEl.getFilePosition()));
+ }
+ }
+}
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/ExpressionSanitizerCaja.java
==============================================================================
--- /trunk/src/java/com/google/caja/plugin/ExpressionSanitizerCaja.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/ExpressionSanitizerCaja.java
Thu Dec 20 08:56:04 2007
@@ -16,7 +16,6 @@
import com.google.caja.parser.AbstractParseTreeNode;
import com.google.caja.parser.AncestorChain;
-import com.google.caja.parser.ParseTreeNodes;
import com.google.caja.parser.quasiliteral.DefaultRewriter;
import com.google.caja.reporting.MessageQueue;
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/GxpCompiler.java
==============================================================================
--- /trunk/src/java/com/google/caja/plugin/GxpCompiler.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/GxpCompiler.java
Thu Dec 20 08:56:04 2007
@@ -394,6 +394,7 @@
mq.addMessage(PluginMessageType.MISSING_ATTRIBUTE,
attrEl.getFilePosition(),
MessagePart.Factory.valueOf("name"), t);
+ continue;
}
String name = assertHtmlIdentifier(nameT.getValue(), nameT);
AttributeXform xform = xformForAttribute(tagName, name);
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/HtmlPluginCompiler.java
==============================================================================
--- /trunk/src/java/com/google/caja/plugin/HtmlPluginCompiler.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/HtmlPluginCompiler.java
Thu Dec 20 08:56:04 2007
@@ -108,7 +108,8 @@
private ObjectConstructor pluginPrivate;
public HtmlPluginCompiler(String nsName, String nsPrefix,
- String rootDivId,
PluginMeta.TranslationScheme scheme) {
+ String rootDivId,
+ PluginMeta.TranslationScheme scheme) {
this(new PluginMeta(nsName, nsPrefix, rootDivId, scheme));
}
@@ -264,7 +265,8 @@
s(new FunctionDeclaration(
s(new Identifier(calloutName)),
s(new FunctionConstructor(
- s(new Identifier(calloutName)), Collections.<FormalParam>emptyList(),
+ s(new Identifier(calloutName)),
+ Collections.<FormalParam>emptyList(),
parsedScriptBody)))))));
scriptDelegate.setFilePosition(parsedScriptBody.getFilePosition());
@@ -275,7 +277,7 @@
// <script type="text/javascript">MyPlugin.tmp123__()</script>
DomTree.Tag callout;
{
- Token<HtmlTokenType> endToken = new Token<HtmlTokenType>(
+ Token<HtmlTokenType> endToken = Token.instance(
">", HtmlTokenType.TAGEND,
FilePosition.endOf(scriptTag.getFilePosition()));
callout = new DomTree.Tag(
@@ -362,7 +364,7 @@
List<CssTree> mediaChildren = new ArrayList<CssTree>();
for (String mediaType : mediaTypes) {
mediaChildren.add(
- new CssTree.Medium(type.getFilePosition(), mediaType));
+ new CssTree.Medium(media.getFilePosition(), mediaType));
}
mediaChildren.addAll(rules);
CssTree.Media mediaBlock = new CssTree.Media(
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/HtmlPluginCompilerMain.java
==============================================================================
--- /trunk/src/java/com/google/caja/plugin/HtmlPluginCompilerMain.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/plugin/HtmlPluginCompilerMain.java
Thu Dec 20 08:56:04 2007
@@ -68,9 +68,6 @@
new Option("r", "root_div_id", true,
"ID of root <div> into which generated JS will inject content");
- private static final Option SCHEME =
- new Option("s", "caja", false, "Emit Baja code instead of Aaja code");
-
private static final Options options = new Options();
static {
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/util/Join.java
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/java/com/google/caja/util/Join.java
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,87 @@
+// Copyright (C) 2007 Google Inc.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.google.caja.util;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Iterator;
+
+/**
+ * Combines strings around a separators.
+ *
+ * @author mikes...@gmail.com
+ */
+public class Join {
+
+ /** Join items on separator. */
+ public static String join(CharSequence sep, CharSequence... items) {
+ int n = items.length;
+ int sumOfLengths = sep.length() * n;
+ for (int i = n; --i >= 0;) { sumOfLengths += items[i].length(); }
+ StringBuilder sb = new StringBuilder(sumOfLengths);
+ join(sb, sep, items);
+ return sb.toString();
+ }
+
+ /** Join items on separator. */
+ public static String join(
+ CharSequence sep, Iterable<? extends CharSequence> items) {
+ StringBuilder sb = new StringBuilder();
+ join(sb, sep, items);
+ return sb.toString();
+ }
+
+ /** Join items on separator, appending the result to out. */
+ public static void join(
+ Appendable out, CharSequence sep, Iterable<? extends
CharSequence> items)
+ throws IOException {
+ Iterator<? extends CharSequence> it = items.iterator();
+ if (!it.hasNext()) { return; }
+
+ out.append(it.next());
+ while (it.hasNext()) {
+ out.append(sep).append(it.next());
+ }
+ }
+
+ /** Join items on separator, appending the result to out. */
+ public static void join(StringBuilder out, CharSequence sep,
+ Iterable<? extends CharSequence> items) {
+ try {
+ join((Appendable) out, sep, items);
+ } catch (IOException ex) {
+ // StringBuilder does not throw IOException
+ throw new RuntimeException(ex);
+ }
+ }
+
+ /** Join items on separator, appending the result to out. */
+ public static void join(
+ Appendable out, CharSequence sep, CharSequence... items)
+ throws IOException {
+ join(out, sep, Arrays.asList(items));
+ }
+
+ /** Join items on separator, appending the result to out. */
+ public static void join(
+ StringBuilder out, CharSequence sep, CharSequence... items) {
+ try {
+ join((Appendable) out, sep, Arrays.asList(items));
+ } catch (IOException ex) {
+ // StringBuilder does not throw IOException
+ throw new RuntimeException(ex);
+ }
+ }
+}
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/AllTests.java
==============================================================================
--- /trunk/src/javatests/com/google/caja/AllTests.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/AllTests.java
Thu Dec 20 08:56:04 2007
@@ -46,6 +46,7 @@
import com.google.caja.plugin.PluginCompilerTest;
import com.google.caja.plugin.UrlUtilTest;
import com.google.caja.plugin.caps.CapabilityRewriterTest;
+import com.google.caja.util.JoinTest;
import com.google.caja.util.SparseBitSetTest;
import junit.framework.Test;
import junit.framework.TestCase;
@@ -73,6 +74,7 @@
CssTreeTest.class,
CssValidatorTest.class,
DefaultGadgetRewriterTest.class,
+ DefaultRewriterTest.class,
DomParserTest.class,
EscapingTest.class,
ExpressionSanitizerTest.class,
@@ -82,20 +84,20 @@
HtmlCompiledPluginTest.class,
HtmlLexerTest.class,
HtmlWhitelistTest.class,
+ JoinTest.class,
JsHtmlParserTest.class,
JsLexerTest.class,
LookaheadCharProducerTest.class,
+ MatchTest.class,
ParseTreeNodeTest.class,
ParserTest.class,
PluginCompilerTest.class,
PunctuationTrieTest.class,
+ QuasiBuilderTest.class,
SparseBitSetTest.class,
StringLiteralTest.class,
UrlUtilTest.class,
- MatchTest.class,
- QuasiBuilderTest.class,
- DefaultRewriterTest.class,
- };
+ };
Pattern testFilter = Pattern.compile(System.getProperty("test.filter", ""));
for (Class<? extends TestCase> testClass : testClasses) {
if (testFilter.matcher(testClass.getName()).find()) {
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/HtmlLexerTest.java
==============================================================================
--- /trunk/src/javatests/com/google/caja/lexer/HtmlLexerTest.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/HtmlLexerTest.java
Thu Dec 20 08:56:04 2007
@@ -46,6 +46,29 @@
assertEquals(golden, actual.toString());
}
+ public void testXmlLexer() throws Exception {
+ // Read the input.
+ InputSource input = new InputSource(
+ TestUtil.getResource(getClass(), "htmllexerinput2.xml"));
+
+ // Do the lexing.
+ CharProducer p = CharProducer.Factory.create(input);
+ StringBuilder actual = new StringBuilder();
+ try {
+ HtmlLexer lexer = new HtmlLexer(p);
+ lexer.setTreatedAsXml(true);
+ lex(lexer, actual);
+ } finally {
+ p.close();
+ }
+
+ // Get the golden.
+ String golden = TestUtil.readResource(getClass(), "htmllexergolden2.txt");
+
+ // Compare.
+ assertEquals(golden, actual.toString());
+ }
+
private void lex(HtmlLexer lexer, Appendable out) throws Exception {
int maxTypeLength = 0;
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexergolden1.txt
==============================================================================
--- /trunk/src/javatests/com/google/caja/lexer/htmllexergolden1.txt (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexergolden1.txt
Thu Dec 20 08:56:04 2007
@@ -71,7 +71,7 @@
ATTRNAME [name] : htmllexerinput1.html:35+11@766 - 15@770
ATTRVALUE ["onchange"] : htmllexerinput1.html:35+16@771 - 26@781
TAGEND [>] : htmllexerinput1.html:35+26@781 - 27@782
-CDATA [<![CDATA[alert("<b>hi</b>");]]>] :
htmllexerinput1.html:35+27@782 - 58@813
+TEXT [alert("<b>hi</b>");] :
htmllexerinput1.html:35+27@782 - 58@813
TAGBEGIN [</gxp:attr] : htmllexerinput1.html:35+58@813 - 68@823
TAGEND [>] : htmllexerinput1.html:35+68@823 - 69@824
TEXT [\n] : htmllexerinput1.html:35+69@824 - 36+1@825
@@ -131,4 +131,6 @@
TEXT [\n] : htmllexerinput1.html:60+8@1320 - 61+1@1321
TAGBEGIN [</html] : htmllexerinput1.html:61+1@1321 - 7@1327
TAGEND [>] : htmllexerinput1.html:61+7@1327 - 8@1328
-TEXT [\n] : htmllexerinput1.html:61+8@1328 - 62+1@1329
+TEXT [\n\n] : htmllexerinput1.html:61+8@1328 - 63+1@1330
+DIRECTIVE [<![CDATA[ No such thing as a CDATA>] :
htmllexerinput1.html:63+1@1330 - 36@1365
+TEXT [ section in HTML ]]>\n] :
htmllexerinput1.html:63+36@1365 - 64+1@1386
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexergolden2.txt
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexergolden2.txt
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,41 @@
+DIRECTIVE [<?xml version="1.0" ?>] : htmllexerinput2.xml:1+1@1 - 23@23
+TEXT [\n\n] : htmllexerinput2.xml:1+23@23 - 3+1@25
+DIRECTIVE [<!DOCTYPE foo>] : htmllexerinput2.xml:3+1@25 - 15@39
+TEXT [\n\n] : htmllexerinput2.xml:3+15@39 - 5+1@41
+COMMENT [<!-- A Comment -->] : htmllexerinput2.xml:5+1@41 - 19@59
+TEXT [\n\n] : htmllexerinput2.xml:5+19@59 - 7+1@61
+TAGBEGIN [<foo] : htmllexerinput2.xml:7+1@61 - 5@65
+TAGEND [>] : htmllexerinput2.xml:7+5@65 - 6@66
+TEXT [\n\n ] : htmllexerinput2.xml:7+6@66 - 9+3@70
+TAGBEGIN [<title] : htmllexerinput2.xml:9+3@70 - 9@76
+TAGEND [>] : htmllexerinput2.xml:9+9@76 - 10@77
+TEXT [Not RCDATA] : htmllexerinput2.xml:9+10@77 - 20@87
+TAGBEGIN [</title] : htmllexerinput2.xml:9+20@87 - 27@94
+TAGEND [>] : htmllexerinput2.xml:9+27@94 - 28@95
+TEXT [\n\n ] : htmllexerinput2.xml:9+28@95 - 11+3@99
+TAGBEGIN [<bar] : htmllexerinput2.xml:11+3@99 - 7@103
+TAGEND [>] : htmllexerinput2.xml:11+7@103 - 8@104
+TEXT [Bar & bar] : htmllexerinput2.xml:11+8@104 - 21@117
+TAGBEGIN [</bar] : htmllexerinput2.xml:11+21@117 - 26@122
+TAGEND [>] : htmllexerinput2.xml:11+26@122 - 27@123
+TEXT [\n\n ] : htmllexerinput2.xml:11+27@123 - 13+3@127
+TAGBEGIN [<baz] : htmllexerinput2.xml:13+3@127 - 7@131
+TAGEND [/>] : htmllexerinput2.xml:13+7@131 - 9@133
+TEXT [\n\n ] : htmllexerinput2.xml:13+9@133 - 15+3@137
+TAGBEGIN [<boo] : htmllexerinput2.xml:15+3@137 - 7@141
+TAGEND [>] : htmllexerinput2.xml:15+7@141 - 8@142
+CDATA [<![CDATA[ 1 < 2 && 4 ]> 3 ]]>] :
htmllexerinput2.xml:15+8@142 - 41@175
+TAGBEGIN [</boo] : htmllexerinput2.xml:15+41@175 - 46@180
+TAGEND [>] : htmllexerinput2.xml:15+46@180 - 47@181
+TEXT [\n\n ] : htmllexerinput2.xml:15+47@181 - 17+3@185
+TAGBEGIN [<plaintext] : htmllexerinput2.xml:17+3@185 - 13@195
+ATTRNAME [attrib] : htmllexerinput2.xml:17+14@196 - 20@202
+ATTRVALUE ["value"] : htmllexerinput2.xml:17+21@203 - 28@210
+TAGEND [>] : htmllexerinput2.xml:17+28@210 - 29@211
+TEXT [Not CDATA] : htmllexerinput2.xml:17+29@211 - 38@220
+TAGBEGIN [</plaintext] : htmllexerinput2.xml:17+38@220 - 49@231
+TAGEND [>] : htmllexerinput2.xml:17+49@231 - 50@232
+TEXT [\n\n] : htmllexerinput2.xml:17+50@232 - 19+1@234
+TAGBEGIN [</foo] : htmllexerinput2.xml:19+1@234 - 6@239
+TAGEND [>] : htmllexerinput2.xml:19+6@239 - 7@240
+TEXT [\n] : htmllexerinput2.xml:19+7@240 - 20+1@241
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexerinput1.html
==============================================================================
--- /trunk/src/javatests/com/google/caja/lexer/htmllexerinput1.html (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexerinput1.html
Thu Dec 20 08:56:04 2007
@@ -32,7 +32,7 @@
</div>
<input id=foo>
-<gxp:attr name="onchange"><![CDATA[alert("<b>hi</b>");]]></gxp:attr>
+<gxp:attr name="onchange">alert("<b>hi</b>");</gxp:attr>
</input>
<pre><div id=notarealtag onclick=notcode()></pre>
@@ -59,3 +59,5 @@
</body>
</html>
+
+<![CDATA[ No such thing as a CDATA> section in HTML ]]>
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexerinput2.xml
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/lexer/htmllexerinput2.xml
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,19 @@
+<?xml version="1.0" ?>
+
+<!DOCTYPE foo>
+
+<!-- A Comment -->
+
+<foo>
+
+ <title>Not RCDATA</title>
+
+ <bar>Bar & bar</bar>
+
+ <baz/>
+
+ <boo><![CDATA[ 1 < 2 && 4 ]> 3 ]]></boo>
+
+ <plaintext attrib="value">Not CDATA</plaintext>
+
+</foo>
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/opensocial/example-rewritten.xml
==============================================================================
---
/trunk/src/javatests/com/google/caja/opensocial/example-rewritten.xml (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/opensocial/example-rewritten.xml
Thu Dec 20 08:56:04 2007
@@ -33,7 +33,7 @@
___OUTERS___.c_5___ = function (event) {
___OUTERS___.handleClicky();
};
- ___OUTERS___.emitHtml___('\n\n \n \n\n \n \n\n \n
\n\n \n \n\n \n \074p class=\"DOM-PREFIX-p1
DOM-PREFIX-p2\"\076Paragraph 1\074/p\076\n\n ');
+ ___OUTERS___.emitHtml___('\n\n \n \n\n \n \n\n \n
\n\n \n \074p class=\"DOM-PREFIX-p1
DOM-PREFIX-p2\"\076Paragraph 1\074/p\076\n\n ');
scriptElement_1___();
___OUTERS___.emitHtml___('\n\n \n \074p
id=\"DOM-PREFIX-p3\"\076Paragraph 2\074/p\076\n\n \n \n\n \n ');
scriptElement_3___();
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/css/CssParserTest.java
==============================================================================
--- /trunk/src/javatests/com/google/caja/parser/css/CssParserTest.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/css/CssParserTest.java
Thu Dec 20 08:56:04 2007
@@ -39,29 +39,29 @@
InputSource is = new InputSource(URI.create("test:///"));
FilePosition pos = FilePosition.instance(is, 1, 1, 1, 1);
assertEquals("", CssParser.unescape(
- new Token<CssTokenType>("", CssTokenType.IDENT, pos)));
+ Token.instance("", CssTokenType.IDENT, pos)));
assertEquals("foo", CssParser.unescape(
- new Token<CssTokenType>("foo", CssTokenType.IDENT, pos)));
+ Token.instance("foo", CssTokenType.IDENT, pos)));
assertEquals("foo", CssParser.unescape(
- new Token<CssTokenType>("f\\oo", CssTokenType.IDENT, pos)));
+ Token.instance("f\\oo", CssTokenType.IDENT, pos)));
assertEquals("!important", CssParser.unescape(
- new Token<CssTokenType>("! important", CssTokenType.IDENT, pos)));
+ Token.instance("! important", CssTokenType.IDENT, pos)));
assertEquals("!important", CssParser.unescape(
- new Token<CssTokenType>("! important", CssTokenType.IDENT, pos)));
+ Token.instance("! important", CssTokenType.IDENT, pos)));
assertEquals("'foo bar'", CssParser.unescape(
- new Token<CssTokenType>("'foo bar'", CssTokenType.STRING, pos)));
+ Token.instance("'foo bar'", CssTokenType.STRING, pos)));
assertEquals("'foo bar'", CssParser.unescape(
- new Token<CssTokenType>("'foo\\ bar'", CssTokenType.STRING, pos)));
+ Token.instance("'foo\\ bar'", CssTokenType.STRING, pos)));
assertEquals("'foo bar'", CssParser.unescape(
- new Token<CssTokenType>("'foo\\ b\\\nar'",
CssTokenType.STRING, pos)));
+ Token.instance("'foo\\ b\\\nar'", CssTokenType.STRING, pos)));
assertEquals("'foo bar'", CssParser.unescape(
- new Token<CssTokenType>("'foo\\ b\\\rar'",
CssTokenType.STRING, pos)));
+ Token.instance("'foo\\ b\\\rar'", CssTokenType.STRING, pos)));
assertEquals("'ffoo bar'", CssParser.unescape(
- new Token<CssTokenType>("'\\66 foo bar'", CssTokenType.STRING, pos)));
+ Token.instance("'\\66 foo bar'", CssTokenType.STRING, pos)));
assertEquals("foo-bar", CssParser.unescape(
- new Token<CssTokenType>("\\66oo-ba\\0072", CssTokenType.IDENT, pos)));
+ Token.instance("\\66oo-ba\\0072", CssTokenType.IDENT, pos)));
assertEquals("\\66oo-bar", CssParser.unescape(
- new Token<CssTokenType>("\\\\66oo-ba\\0072",
CssTokenType.IDENT, pos)));
+ Token.instance("\\\\66oo-ba\\0072", CssTokenType.IDENT, pos)));
}
public void testCssParser1() throws Exception {
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/html/DomParserTest.java
==============================================================================
--- /trunk/src/javatests/com/google/caja/parser/html/DomParserTest.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/html/DomParserTest.java
Thu Dec 20 08:56:04 2007
@@ -15,21 +15,36 @@
package com.google.caja.parser.html;
import com.google.caja.lexer.CharProducer;
+import com.google.caja.lexer.FilePosition;
import com.google.caja.lexer.HtmlLexer;
import com.google.caja.lexer.HtmlTokenType;
import com.google.caja.lexer.InputSource;
+import com.google.caja.lexer.ParseException;
import com.google.caja.lexer.Token;
import com.google.caja.lexer.TokenQueue;
+import com.google.caja.reporting.Message;
import com.google.caja.reporting.MessageContext;
+import com.google.caja.reporting.MessageQueue;
+import com.google.caja.reporting.RenderContext;
+import com.google.caja.reporting.SimpleMessageQueue;
import com.google.caja.util.Criterion;
+import com.google.caja.util.Join;
+import static com.google.caja.util.MoreAsserts.*;
+import java.io.IOException;
import java.io.StringReader;
import java.net.URI;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.IdentityHashMap;
+import java.util.List;
import junit.framework.TestCase;
/**
* testcase for {@link DomParser}.
+ * http://james.html5.org/parsetree.html is a useful resource for testing
+ * HTML related tests.
*
* @author mikes...@gmail.com
*/
@@ -78,32 +93,1542 @@
+ "";
public void testParseDom() throws Exception {
- InputSource is = new InputSource(
- URI.create("test:///" + DomParserTest.class.getName()));
- TokenQueue<HtmlTokenType> tq;
- if (false) {
- CharProducer cp = CharProducer.Factory.create(
- new StringReader(DOM1_XML), is);
- HtmlLexer lexer = new HtmlLexer(cp);
- lexer.setTreatedAsXml(true);
- tq = new TokenQueue<HtmlTokenType>(
- lexer, is, Criterion.Factory.<Token<HtmlTokenType>>optimist());
- while (!tq.isEmpty()) {
- Token<HtmlTokenType> t = tq.pop();
- System.err.println("t.type=" + t.type + ", text=[" + t.text + "]");
- }
- }
- {
- CharProducer cp = CharProducer.Factory.create(
- new StringReader(DOM1_XML), is);
- HtmlLexer lexer = new HtmlLexer(cp);
- lexer.setTreatedAsXml(true);
- tq = new TokenQueue<HtmlTokenType>(
- lexer, is, Criterion.Factory.<Token<HtmlTokenType>>optimist());
- }
+ TokenQueue<HtmlTokenType> tq = tokenizeTestInput(DOM1_XML, true);
DomTree t = DomParser.parseDocument(tq);
StringBuilder actual = new StringBuilder();
t.format(new MessageContext(), actual);
assertEquals(DOM1_GOLDEN, actual.toString());
+ }
+
+ public void testHtml1() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html><head>",
+ "",
+ "<title>Foo<a> & Bar & Baz</a></title>",
+ "",
+ "</head>",
+ "<body bgcolor=white>",
+ " <BoDY onload=panic()>",
+ "<body onerror=panic() onload=dontpanic()>",
+ "",
+ "</body>",
+ "<taBLe>",
+ "<td>Howdy</html></tablE>"),
+ Arrays.asList(
+ "Tag : html 1+1-12+25", // </html> inside <table> ignored
+ " Tag : head 1+7-5+8",
+ " Text : \n\n 1+13-3+1",
+ " Tag : title 3+1-3+42",
+ " Text : Foo<a> & Bar & Baz</a> 3+8-3+34",
+ " Text : \n\n 3+42-5+1",
+ " Text : \n 5+8-6+1",
+ " Tag : body 6+1-12+25",
+ " Attrib : bgcolor 6+7-6+14",
+ " Value : white 6+15-6+20",
+ // Include attributes folded in from other body tags
+ " Attrib : onload 7+9-7+15",
+ " Value : panic() 7+16-7+23",
+ " Attrib : onerror 8+7-8+14",
+ " Value : panic() 8+15-8+22",
+ // Normalized text from in between body tags
+ " Text : \n \n\n\n\n 6+21-11+1",
+ " Tag : table 11+1-12+25", // Name is canonicalized
+ " Text : \n 11+8-12+1",
+ " Tag : tbody 12+1-12+17",
+ " Tag : tr 12+1-12+17",
+ " Tag : td 12+1-12+17",
+ " Text : Howdy 12+5-12+10"
+ ),
+ Arrays.asList(
+ "LINT testHtml1:7+3 - 24: "
+ + "'body' start tag found but the 'body' element is
already open.",
+ "LINT testHtml1:8+1 - 42: "
+ + "'body' start tag found but the 'body' element is
already open.",
+ "LINT testHtml1:11+1 - 8: Stray 'table' start tag.",
+ "LINT testHtml1:12+1 - 5: 'td' start tag in table body.",
+ "LINT testHtml1:12+10 - 17: Stray end tag 'html'."
+ ),
+ Arrays.asList(
+ "<html><head>",
+ "",
+ // Entities in title consistently escaped
+ "<title>Foo<a> & Bar & Baz</a></title>",
+ "",
+ "</head>",
+ // Merged attributes
+ "<body bgcolor=\"white\" onload=\"panic()\" onerror=\"panic()\">",
+ " ",
+ "",
+ "",
+ "",
+ "<table>",
+ // Implied body and rows in table
+ "<tbody><tr><td>Howdy</td></tr></tbody></table></body></html>"
+ )
+ );
+ }
+
+ public void testHtml2() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html>",
+ " <head>",
+ " <link rel=stylesheet href=styles-1.css>",
+ " <meta http-equiv=charset content=utf-8 />",
+ " </head>",
+ " <script src='foo.js'></script>",
+ " <body>",
+ " <script type='text/javascript'>//<![CDATA[",
+ " foo() && bar();",
+ " //]]></script>",
+ "</html>"),
+ Arrays.asList(
+ "Tag : html 1+1-11+8", // </html> inside <table> ignored
+ " Text : \n 1+7-2+3",
+ " Tag : head 2+3-6+24",
+ " Text : \n 2+9-3+5",
+ " Tag : link 3+5-3+44",
+ " Attrib : rel 3+11-3+14",
+ " Value : stylesheet 3+15-3+25",
+ " Attrib : href 3+26-3+30",
+ " Value : styles-1.css 3+31-3+43",
+ " Text : \n 3+44-4+5",
+ " Tag : meta 4+5-4+46",
+ " Attrib : http-equiv 4+11-4+21",
+ " Value : charset 4+22-4+29",
+ " Attrib : content 4+30-4+37",
+ " Value : utf-8 4+38-4+43",
+ " Text : \n 4+46-5+3",
+ " Tag : script 6+3-6+33",
+ " Attrib : src 6+11-6+14",
+ " Value : foo.js 6+15-6+23",
+ " Text : \n \n 5+10-7+3",
+ " Tag : body 7+3-11+1",
+ " Text : \n 7+9-8+5",
+ " Tag : script 8+5-10+19",
+ " Attrib : type 8+13-8+17",
+ " Value : text/javascript 8+18-8+35",
+ (" Text : //<![CDATA[\n foo() && bar();\n //]]>"
+ + " 8+36-10+10"),
+ " Text : \n 10+19-11+1"
+ ),
+ Arrays.asList(
+ "LINT testHtml2:6+3 - 24:"
+ + " 'script' element between 'head' and 'body'."
+ ),
+ Arrays.asList(
+ "<html>",
+ " <head>",
+ " <link rel=\"stylesheet\" href=\"styles-1.css\" />",
+ " <meta http-equiv=\"charset\" content=\"utf-8\" />",
+ " <script src=\"foo.js\"></script></head>",
+ " ",
+ " <body>",
+ " <script type=\"text/javascript\">//<![CDATA[",
+ " foo() && bar();",
+ " //]]></script>",
+ "</body></html>"
+ )
+ );
+ }
+
+ public void testBeforeHead() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html><SCRIPT>foo()</scriPt></html>"),
+ Arrays.asList(
+ "Tag : html 1+1-1+36",
+ " Tag : head 1+7-1+29",
+ " Tag : script 1+7-1+29",
+ " Text : foo() 1+15-1+20",
+ " Tag : body 1+29-1+29"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<html><head><script>foo()</script></head><body></body></html>"
+ )
+ );
+ }
+
+ public void testMinimalHtml() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html></html>"),
+ Arrays.asList(
+ "Tag : html 1+1-1+14",
+ " Tag : head 1+7-1+7",
+ " Tag : body 1+7-1+7"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<html><head></head><body></body></html>"
+ )
+ );
+ }
+
+ public void testMinimalFrameset() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html><frameset></frameset></html>"),
+ Arrays.asList(
+ "Tag : html 1+1-1+35",
+ " Tag : head 1+7-1+7",
+ " Tag : frameset 1+7-1+28"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<html><head></head><frameset></frameset></html>"
+ )
+ );
+ }
+
+ public void testSpuriousCloseTagBeforeHead() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html></script></html>"),
+ Arrays.asList(
+ "Tag : html 1+1-1+23",
+ " Tag : head 1+16-1+16",
+ " Tag : body 1+16-1+16"
+ ),
+ Arrays.asList(
+ ("LINT testSpuriousCloseTagBeforeHead:1+7 - 16"
+ + ": Stray end tag 'script'.")
+ ),
+ Arrays.asList(
+ "<html><head></head><body></body></html>"
+ )
+ );
+ }
+
+ public void testHeadless() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html><body>Hello World</body></html>"),
+ Arrays.asList(
+ "Tag : html 1+1-1+38",
+ " Tag : head 1+7-1+7",
+ " Tag : body 1+7-1+31",
+ " Text : Hello World 1+13-1+24"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<html><head></head><body>Hello World</body></html>"
+ )
+ );
+ }
+
+ public void testLooseStyleTagEndsInHead() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html><style>",
+ "body:before { content: 'Hello ' }",
+ "body:after { content: \"World\" }",
+ "</style></html>"),
+ Arrays.asList(
+ "Tag : html 1+1-4+16",
+ " Tag : head 1+7-4+9",
+ " Tag : style 1+7-4+9",
+ (" Text : \nbody:before { content: 'Hello ' }"
+ + "\nbody:after { content: \"World\" }\n 1+14-4+1"),
+ " Tag : body 4+9-4+9"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<html><head><style>",
+ "body:before { content: 'Hello ' }",
+ "body:after { content: \"World\" }",
+ "</style></head><body></body></html>"
+ )
+ );
+ }
+
+ public void testDoubleHeadedHtml() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html><head><head><body></body></html>"),
+ Arrays.asList(
+ "Tag : html 1+1-1+39",
+ " Tag : head 1+7-1+19",
+ " Tag : body 1+19-1+32"
+ ),
+ Arrays.asList(
+ "LINT testDoubleHeadedHtml:1+13 - 19:"
+ + " Start tag for 'head' seen when 'head' was already open."
+ ),
+ Arrays.asList(
+ "<html><head></head><body></body></html>"
+ )
+ );
+ }
+
+ public void testBodyFragment() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<body><p><Bar><p>Baz</body>"),
+ Arrays.asList(
+ "Fragment 1+1-1+37",
+ " Tag : p 1+7-1+24",
+ " Text : <Bar> 1+10-1+24", // Text contains decoded value
+ " Tag : p 1+24-1+37",
+ " Text : Baz 1+27-1+30"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<p><Bar></p><p>Baz</p>"
+ )
+ );
+ }
+
+ public void testTableFragment() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<table>",
+ "<tr><th>Hi<td>There",
+ "</table>"),
+ Arrays.asList(
+ "Fragment 1+1-3+9",
+ " Tag : table 1+1-3+9",
+ " Text : \n 1+8-2+1",
+ " Tag : tbody 2+1-3+1",
+ " Tag : tr 2+1-3+1",
+ " Tag : th 2+5-2+11",
+ " Text : Hi 2+9-2+11",
+ " Tag : td 2+11-3+1",
+ " Text : There\n 2+15-3+1"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<table>",
+ "<tbody><tr><th>Hi</th><td>There",
+ "</td></tr></tbody></table>"
+ )
+ );
+ }
+
+ public void testFragmentWithClose() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "</head>", // Nothing here but text
+ "<body>Foo"),
+ Arrays.asList(
+ "Fragment 1+1-2+10",
+ " Text : \nFoo 1+8-2+10"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "",
+ "Foo"
+ )
+ );
+ }
+
+ public void testMisplacedTitle() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html><body><title>What head?</title></body></html>"),
+ Arrays.asList(
+ "Tag : html 1+1-1+52",
+ " Tag : head 1+7-1+30",
+ " Tag : title 1+13-1+38",
+ " Text : What head? 1+20-1+30",
+ " Tag : body 1+7-1+45"
+ ),
+ Arrays.<String>asList(
+ "LINT testMisplacedTitle:1+13 - 20:"
+ + " 'title' element found inside 'body'."
+ ),
+ Arrays.asList(
+ "<html><head><title>What head?</title></head><body></body></html>"
+ )
+ );
+ }
+
+ public void testBodyWithAttributeInFragment() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList( // A fragment with body attributes is a whole document
+ "<div>",
+ "<p>Foo<",
+ "<body bgcolor=white>",
+ "<p>Bar</body>"),
+ Arrays.asList(
+ "Fragment 1+1-4+14",
+ " Tag : html 1+1-4+14",
+ " Tag : head 1+1-1+1",
+ " Tag : body 1+1-4+14",
+ " Attrib : bgcolor 3+7-3+14",
+ " Value : white 3+15-3+20",
+ " Tag : div 1+1-4+14",
+ " Text : \n 1+6-2+1",
+ " Tag : p 2+1-4+1",
+ " Text : Foo<\n\n 2+4-4+1",
+ " Tag : p 4+1-4+14",
+ " Text : Bar 4+4-4+7"
+ ),
+ Arrays.asList(
+ "LINT testBodyWithAttributeInFragment:3+1 - 21:"
+ + " 'body' start tag found but the 'body' element is
already open.",
+ "LINT testBodyWithAttributeInFragment:4+7 - 14:"
+ + " End tag for 'body' seen but there were unclosed elements."
+ ),
+ Arrays.asList(
+ "<html><head></head><body bgcolor=\"white\"><div>",
+ "<p>Foo<",
+ "",
+ "</p><p>Bar</p></div></body></html>"
+ )
+ );
+ }
+
+ public void testListFragment() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ // </br> is treated as a <br>. <br></br> -- 2 for the
price of 1!
+ "<div></br>",
+ "<p>Foo<br>",
+ "</html >",
+ "<ul><li>One</ul>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-4+17",
+ " Tag : div 1+1-4+17",
+ " Tag : br 1+6-1+11",
+ " Text : \n 1+11-2+1",
+ " Tag : p 2+1-4+1",
+ " Text : Foo 2+4-2+7",
+ " Tag : br 2+7-2+11",
+ " Text : \n\n 2+11-4+1",
+ " Tag : ul 4+1-4+17",
+ " Tag : li 4+5-4+12",
+ " Text : One 4+9-4+12"
+ ),
+ Arrays.asList(
+ "LINT testListFragment:1+6 - 11: End tag 'br'.",
+ "LINT testListFragment:3+1 - 9:"
+ + " End tag for 'html' seen but there were unclosed elements.",
+ "LINT testListFragment:4+1 - 5: Stray 'ul' start tag.",
+ "LINT testListFragment:4+17:"
+ + " End of file seen and there were open elements."
+ ),
+ Arrays.asList(
+ "<div><br />",
+ // </br> is interpreted as <br> so we need to make sure
<br> does
+ // not have an end tag.
+ "<p>Foo<br />",
+ "",
+ "</p><ul><li>One</li></ul></div>"
+ )
+ );
+ }
+
+ public void testFormsNotNested() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<form action= 'fly'>",
+ "<input type=radio name=method value=rockets>",
+ "<form action=walk>",
+ "<input type=radio name=method value=wings>",
+ "<input type=submit value ='Take off'>",
+ "</form></form>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-6+15",
+ " Tag : form 1+1-6+8",
+ " Attrib : action 1+7-1+13",
+ " Value : fly 1+15-1+20",
+ " Text : \n 1+21-2+1",
+ " Tag : input 2+1-2+45",
+ " Attrib : type 2+8-2+12",
+ " Value : radio 2+13-2+18",
+ " Attrib : name 2+19-2+23",
+ " Value : method 2+24-2+30",
+ " Attrib : value 2+31-2+36",
+ " Value : rockets 2+37-2+44",
+ " Text : \n\n 2+45-4+1",
+ " Tag : input 4+1-4+43",
+ " Attrib : type 4+8-4+12",
+ " Value : radio 4+13-4+18",
+ " Attrib : name 4+19-4+23",
+ " Value : method 4+24-4+30",
+ " Attrib : value 4+31-4+36",
+ " Value : wings 4+37-4+42",
+ " Text : \n 4+43-5+1",
+ " Tag : input 5+1-5+38",
+ " Attrib : type 5+8-5+12",
+ " Value : submit 5+13-5+19",
+ " Attrib : value 5+20-5+25",
+ " Value : Take off 5+27-5+37",
+ " Text : \n 5+38-6+1"
+ ),
+ Arrays.asList(
+ "LINT testFormsNotNested:3+1 - 19: Saw a 'form' start tag"
+ + ", but there was already an active 'form' element.",
+ "LINT testFormsNotNested:6+8 - 15: End tag 'form' seen but"
+ + " there were unclosed elements."
+ ),
+ Arrays.asList(
+ "<form action=\"fly\">",
+ "<input type=\"radio\" name=\"method\" value=\"rockets\" />",
+ "",
+ "<input type=\"radio\" name=\"method\" value=\"wings\" />",
+ "<input type=\"submit\" value=\"Take off\" />",
+ "</form>"
+ )
+ );
+ }
+
+ public void testListNesting() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<ul id=unordered-list>",
+ "<li>From Chaos,</li></li>", // Warn on useless </li>
+ "<li>and Disorder; arise",
+ "<li><ol id=ordered-list>",
+ " <li><div>Harmony,", // Div makes us look harder to
find li
+ " <li>Balance,</li>",
+ " <li>and Punctuation!?</li></li>", // Second </li> is significant
+ "</ul>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-8+6",
+ " Tag : ul 1+1-8+6",
+ " Attrib : id 1+5-1+7",
+ " Value : unordered-list 1+8-1+22",
+ " Text : \n 1+23-2+1",
+ " Tag : li 2+1-2+21",
+ " Text : From Chaos, 2+5-2+16",
+ " Text : \n 2+26-3+1",
+ " Tag : li 3+1-4+1",
+ " Text : and Disorder; arise\n 3+5-4+1",
+ " Tag : li 4+1-7+34",
+ " Tag : ol 4+5-7+29",
+ " Attrib : id 4+9-4+11",
+ " Value : ordered-list 4+12-4+24",
+ " Text : \n 4+25-5+3",
+ " Tag : li 5+3-6+3",
+ " Tag : div 5+7-6+3",
+ " Text : Harmony,\n 5+12-6+3",
+ " Tag : li 6+3-6+20",
+ " Text : Balance, 6+7-6+15",
+ " Text : \n 6+20-7+3",
+ " Tag : li 7+3-7+29",
+ " Text : and Punctuation!? 7+7-7+24",
+ " Text : \n 7+34-8+1"
+ ),
+ Arrays.asList(
+ // TODO(mikesamuel): this error message seems to be a bug.
+ // There is an error there, but the close tag is spurious.
+ "LINT testListNesting:2+21 - 26:"
+ + " End tag 'li' seen but there were unclosed elements.",
+ "LINT testListNesting:6+3 - 7:"
+ + " A 'li' start tag was seen but the previous 'li' element"
+ + " had open children.",
+ // Ditto wrong message
+ "LINT testListNesting:7+29 - 34:"
+ + " End tag 'li' seen but there were unclosed elements."
+ ),
+ Arrays.asList(
+ "<ul id=\"unordered-list\">",
+ "<li>From Chaos,</li>",
+ "<li>and Disorder; arise",
+ "</li><li><ol id=\"ordered-list\">",
+ " <li><div>Harmony,",
+ " </div></li><li>Balance,</li>",
+ " <li>and Punctuation!?</li></ol></li>",
+ "</ul>"
+ )
+ );
+ }
+
+ public void testParagraphInterrupters() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<body><p>Hi <pre>There</pre>",
+ "<p>Uh <dl><dt>Zero<dd>None<ol><li>Nested<dt><div><p>One<dd>1</dl>",
+ "<p>Li</dl>ne<hr>s",
+ "<p>Well <plaintext>Runs till the end of the <p>Document",
+ "No matter <p>What"
+ ),
+ Arrays.asList(
+ "Tag : html 1+1-5+18",
+ " Tag : head 1+1-1+1",
+ " Tag : body 1+1-5+18",
+ " Tag : p 1+7-1+13",
+ " Text : Hi 1+10-1+13",
+ " Tag : pre 1+13-1+29",
+ " Text : There 1+18-1+23",
+ " Text : \n 1+29-2+1",
+ " Tag : p 2+1-2+7",
+ " Text : Uh 2+4-2+7",
+ " Tag : dl 2+7-2+66",
+ " Tag : dt 2+11-2+19",
+ " Text : Zero 2+15-2+19",
+ " Tag : dd 2+19-2+61",
+ " Text : None 2+23-2+27",
+ " Tag : ol 2+27-2+61",
+ " Tag : li 2+31-2+61",
+ " Text : Nested 2+35-2+41",
+ // DT don't jump out of an OL or UL list
+ " Tag : dt 2+41-2+56",
+ " Tag : div 2+45-2+56",
+ " Tag : p 2+50-2+56",
+ " Text : One 2+53-2+56",
+ " Tag : dd 2+56-2+61",
+ " Text : 1 2+60-2+61",
+ " Text : \n 2+66-3+1",
+ " Tag : p 3+1-3+13",
+ // Not interrupted by </dl>. End tags do not interrupt <p>s
+ " Text : Line 3+4-3+13",
+ " Tag : hr 3+13-3+17",
+ " Text : s\n 3+17-4+1",
+ " Tag : p 4+1-4+9",
+ " Text : Well 4+4-4+9",
+ " Tag : plaintext 4+9-5+18",
+ (" Text : Runs till the end of the <p>Document"
+ + "\nNo matter <p>What"
+ + " 4+20-5+18")
+ ),
+ Arrays.asList(
+ "LINT testParagraphInterrupters:2+56 - 60:"
+ + " A definition list item start tag was seen but the previous"
+ + " definition list item element had open children.",
+ "LINT testParagraphInterrupters:2+61 - 66:"
+ + " End tag 'dl' seen but there were unclosed elements.",
+ "LINT testParagraphInterrupters:3+6 - 11:"
+ + " End tag 'dl' seen but there were unclosed elements.",
+ "LINT testParagraphInterrupters:5+18:"
+ + " End of file seen and there were open elements."
+ ),
+ Arrays.asList(
+ "<html><head></head><body><p>Hi </p><pre>There</pre>",
+ ("<p>Uh </p><dl><dt>Zero</dt><dd>None<ol><li>Nested"
+ + "<dt><div><p>One</p></div></dt><dd>1</dd></li></ol></dd></dl>"),
+ "<p>Line</p><hr />s",
+ "<p>Well </p><plaintext>Runs till the end of the <p>Document",
+ // Not correct but I don't want to muck with DomTree rendering
+ // just for plaintext.
+ "No matter <p>What</plaintext></body></html>"
+ )
+ );
+ }
+
+ public void testParagraphNotNested() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html>",
+ "<p>Foo</p>",
+ "<p>Bar",
+ "<p>Baz</p></p>", // Second close p opens a paragraph because...
+ "</html>"
+ ),
+ Arrays.asList(
+ "Tag : html 1+1-5+8",
+ " Text : \n 1+7-2+1",
+ " Tag : head 2+1-2+1",
+ " Tag : body 2+1-5+1",
+ " Tag : p 2+1-2+11",
+ " Text : Foo 2+4-2+7",
+ " Text : \n 2+11-3+1",
+ " Tag : p 3+1-4+1",
+ " Text : Bar\n 3+4-4+1",
+ " Tag : p 4+1-4+11",
+ " Text : Baz 4+4-4+7",
+ " Tag : p 4+11-4+15",
+ " Text : \n 4+15-5+1"
+ ),
+ Arrays.asList(
+ "LINT testParagraphNotNested:4+11 - 15:"
+ + " End tag 'p' seen but there were unclosed elements."
+ ),
+ Arrays.asList(
+ "<html>",
+ "<head></head><body><p>Foo</p>",
+ "<p>Bar",
+ "</p><p>Baz</p><p></p>",
+ "</body></html>"
+ )
+ );
+ }
+
+ public void testAnyHeadingTagCloses() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html>",
+ "<p>Foo",
+ "<h1><p>Bar</H3>",
+ "",
+ "<h2>Baz",
+ "",
+ "<<h3>Boo</h2>", // Close tag closes h3, not enclosing h2
+ "",
+ "<p>Far></h3></h1>"
+ ),
+ Arrays.asList(
+ "Tag : html 1+1-9+18",
+ " Text : \n 1+7-2+1",
+ " Tag : head 2+1-2+1",
+ " Tag : body 2+1-9+18",
+ " Tag : p 2+1-3+1",
+ " Text : Foo\n 2+4-3+1",
+ " Tag : h1 3+1-3+16",
+ " Tag : p 3+5-3+11",
+ " Text : Bar 3+8-3+11",
+ " Text : \n\n 3+16-5+1",
+ " Tag : h2 5+1-9+13",
+ " Text : Baz\n\n< 5+5-7+2",
+ " Tag : h3 7+2-7+14",
+ " Text : Boo 7+6-7+9",
+ " Text : \n\n 7+14-9+1",
+ " Tag : p 9+1-9+8",
+ " Text : Far> 9+4-9+8"
+ ),
+ Arrays.asList(
+ "LINT testAnyHeadingTagCloses:3+11 - 16:"
+ + " End tag 'h3' seen but there were unclosed elements.",
+ "LINT testAnyHeadingTagCloses:7+9 - 14:"
+ + " End tag 'h2' seen but there were unclosed elements.",
+ "LINT testAnyHeadingTagCloses:9+8 - 13:"
+ + " End tag 'h3' seen but there were unclosed elements.",
+ "LINT testAnyHeadingTagCloses:9+13 - 18:"
+ + " End tag 'h1' seen but there were unclosed elements."
+ ),
+ Arrays.asList(
+ "<html>",
+ "<head></head><body><p>Foo",
+ "</p><h1><p>Bar</p></h1>",
+ "",
+ "<h2>Baz",
+ "",
+ "<<h3>Boo</h3>",
+ "",
+ "<p>Far></p></h2></body></html>"
+ )
+ );
+ }
+
+ public void testLinksDontNest() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<div>",
+ "<a href=foo>Foo",
+ // Links don't nest
+ "<a href=bar>Bar",
+ // unless they do. Table is a scoping element.
+ "<table><caption><a href=baz>Baz</table>boo",
+ "</a>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-5+5",
+ " Tag : div 1+1-5+5",
+ " Text : \n 1+6-2+1",
+ " Tag : a 2+1-3+1",
+ " Attrib : href 2+4-2+8",
+ " Value : foo 2+9-2+12",
+ " Text : Foo\n 2+13-3+1",
+ " Tag : a 3+1-5+5",
+ " Attrib : href 3+4-3+8",
+ " Value : bar 3+9-3+12",
+ " Text : Bar\n 3+13-4+1",
+ " Tag : table 4+1-4+40",
+ " Tag : caption 4+8-4+32",
+ " Tag : a 4+17-4+32",
+ " Attrib : href 4+20-4+24",
+ " Value : baz 4+25-4+28",
+ " Text : Baz 4+29-4+32",
+ " Text : boo\n 4+40-5+1"
+ ),
+ Arrays.<String>asList(
+ "LINT testLinksDontNest:3+1 - 13:"
+ + " An 'a' start tag seen with already an active 'a' element.",
+ "LINT testLinksDontNest:4+32 - 40:"
+ + " 'table' closed but 'caption' was still open.",
+ "LINT testLinksDontNest:4+32 - 40:"
+ + " Unclosed elements on stack.",
+ "LINT testLinksDontNest:5+5:"
+ + " End of file seen and there were open elements."
+ ),
+ Arrays.asList(
+ "<div>",
+ "<a href=\"foo\">Foo",
+ "</a><a href=\"bar\">Bar",
+ "<table><caption><a href=\"baz\">Baz</a></caption></table>boo",
+ "</a></div>"
+ )
+ );
+ }
+
+ public void testFormattingElements() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<body>",
+ "<a href=foo>Foo<nobr>Bar</a>",
+ "",
+ "<b><font COLOR=BLUE>bold&blue</b>",
+ "still blue</font>",
+ "boring",
+ "<span><b>Foo</span>Bar</b></span>"
+ ),
+ Arrays.asList(
+ "Tag : html 1+1-7+34",
+ " Tag : head 1+1-1+1",
+ " Tag : body 1+1-7+34",
+ " Text : \n 1+7-2+1",
+ " Tag : a 2+1-2+29",
+ " Attrib : href 2+4-2+8",
+ " Value : foo 2+9-2+12",
+ " Text : Foo 2+13-2+16",
+ " Tag : nobr 2+16-2+25",
+ " Text : Bar 2+22-2+25",
+ " Tag : nobr 2+16-7+27",
+ " Text : \n\n 2+29-4+1",
+ " Tag : b 4+1-4+34",
+ " Tag : font 4+4-4+30",
+ " Attrib : color 4+10-4+15",
+ " Value : BLUE 4+16-4+20",
+ " Text : bold&blue 4+21-4+30",
+ " Tag : font 4+4-5+18",
+ " Attrib : color 4+10-4+15",
+ " Value : BLUE 4+16-4+20",
+ " Text : \nstill blue 4+34-5+11",
+ " Text : \nboring\n 5+18-7+1",
+ " Tag : span 7+1-7+20",
+ " Tag : b 7+7-7+13",
+ " Text : Foo 7+10-7+13",
+ " Tag : b 7+7-7+27",
+ " Text : Bar 7+20-7+23"
+ ),
+ Arrays.asList(
+ "LINT testFormattingElements:2+25 - 29:"
+ + " End tag 'a' violates nesting rules.",
+ "LINT testFormattingElements:4+30 - 34:"
+ + " End tag 'b' violates nesting rules.",
+ "LINT testFormattingElements:7+13 - 20: Unclosed element 'b'.",
+ "LINT testFormattingElements:7+27 - 34: Unclosed element 'nobr'."
+ ),
+ Arrays.asList(
+ "<html><head></head><body>",
+ "<a href=\"foo\">Foo<nobr>Bar</nobr></a><nobr>",
+ "",
+ ("<b><font color=\"BLUE\">bold&blue</font></b>"
+ + "<font color=\"BLUE\">"),
+ "still blue</font>",
+ "boring",
+ "<span><b>Foo</b></span><b>Bar</b></nobr></body></html>"
+ )
+ );
+ }
+
+ public void testObjectElements() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<object>",
+ " <param name=\"foo\" value=\"bar\">",
+ " <param name=\"baz\" value=\"boo\">",
+ "</object>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-4+10",
+ " Tag : object 1+1-4+10",
+ " Text : \n 1+9-2+3",
+ " Tag : param 2+3-2+33",
+ " Attrib : name 2+10-2+14",
+ " Value : foo 2+15-2+20",
+ " Attrib : value 2+21-2+26",
+ " Value : bar 2+27-2+32",
+ " Text : \n 2+33-3+3",
+ " Tag : param 3+3-3+33",
+ " Attrib : name 3+10-3+14",
+ " Value : baz 3+15-3+20",
+ " Attrib : value 3+21-3+26",
+ " Value : boo 3+27-3+32",
+ " Text : \n 3+33-4+1"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<object>",
+ " <param name=\"foo\" value=\"bar\" />",
+ " <param name=\"baz\" value=\"boo\" />",
+ "</object>"
+ )
+ );
+ }
+
+ public void testButtonsDontNest() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<div>",
+ "<button>Foo",
+ "<button>Bar",
+ // Like links, they don't nest except when they do.
+ "<table><td><button>Baz</table>",
+ "</button></button></button>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-5+28",
+ " Tag : div 1+1-5+28",
+ " Text : \n 1+6-2+1",
+ " Tag : button 2+1-3+1",
+ " Text : Foo\n 2+9-3+1",
+ " Tag : button 3+1-5+10",
+ " Text : Bar\n 3+9-4+1",
+ " Tag : table 4+1-4+31",
+ " Tag : tbody 4+8-4+23",
+ " Tag : tr 4+8-4+23",
+ " Tag : td 4+8-4+23",
+ " Tag : button 4+12-4+23",
+ " Text : Baz 4+20-4+23",
+ " Text : \n 4+31-5+1"
+ ),
+ Arrays.asList(
+ "LINT testButtonsDontNest:3+1 - 9:"
+ + " 'button' start tag seen when there was an open 'button'"
+ + " element in scope.",
+ "LINT testButtonsDontNest:4+8 - 12:"
+ + " 'td' start tag in table body.",
+ "LINT testButtonsDontNest:4+23 - 31: Unclosed elements.",
+ "LINT testButtonsDontNest:5+10 - 19:"
+ + " End tag 'button' seen but there were unclosed elements.",
+ "LINT testButtonsDontNest:5+19 - 28:"
+ + " End tag 'button' seen but there were unclosed elements.",
+ "LINT testButtonsDontNest:5+28:"
+ + " End of file seen and there were open elements."
+ ),
+ Arrays.asList(
+ "<div>",
+ "<button>Foo",
+ "</button><button>Bar",
+ ("<table><tbody><tr><td><button>Baz"
+ + "</button></td></tr></tbody></table>"),
+ "</button></div>"
+ )
+ );
+ }
+
+ public void testImageTag() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<image src='foo.gif?a=b&c=d'>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-1+30",
+ " Tag : img 1+1-1+30",
+ " Attrib : src 1+8-1+11",
+ " Value : foo.gif?a=b&c=d 1+12-1+29"
+ ),
+ Arrays.<String>asList(
+ "LINT testImageTag:1+1 - 30: Saw a start tag 'image'."
+ ),
+ Arrays.asList(
+ "<img src=\"foo.gif?a=b&c=d\" />"
+ )
+ );
+ }
+
+ public void testIsIndex() throws Exception {
+ // Its semantics are really wierd and it's deprecated, so drop it.
+ assertParsedHtml(
+ Arrays.asList(
+ "<div><isindex prompt='Blah blah'></div>"
+ ),
+ Arrays.asList( // WTF!?!
+ "Tag : html 1+1-1+40",
+ " Tag : head 1+1-1+1",
+ " Tag : body 1+1-1+40",
+ " Tag : div 1+1-1+40",
+ " Tag : form 1+6-1+6",
+ " Tag : hr 1+6-1+6",
+ " Tag : p 1+6-1+6",
+ " Tag : label 1+6-1+6",
+ " Text : Blah blah 1+33-1+34",
+ " Tag : input 1+6-1+6",
+ " Attrib : name 1+6-1+6",
+ " Value : isindex 1+6-1+6",
+ " Tag : hr 1+6-1+6"
+ ),
+ Arrays.asList(
+ "LINT testIsIndex:1+6 - 34: 'isindex' seen."
+ ),
+ Arrays.asList(
+ "<html><head></head><body><div><form><hr /><p>"
+ + "<label>Blah blah<input name=\"isindex\" /></label>"
+ + "</p><hr /></form></div></body></html>"
+ )
+ );
+ }
+
+ public void testDisablersAreCdata() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<noframes><iframe src=foo></noframes></noscript>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-1+49",
+ " Tag : noframes 1+1-1+38",
+ " Text : <iframe src=foo> 1+11-1+27"
+ ),
+ Arrays.asList(
+ ("LINT testDisablersAreCdata:1+38 - 49:"
+ + " Stray end tag 'noscript'.")
+ ),
+ Arrays.asList(
+ "<noframes><iframe src=foo></noframes>"
+ )
+ );
+ }
+
+ public void testInputElements() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<form>",
+ "<textarea>", // First and last newlines stripped
+ " <form action='a?b=c&d=e&f=g'>", // content is RCDATA
+ " </form>",
+ "</textarea>",
+ "",
+ "<select>",
+ " <option>One",
+ " <option>Two</option>",
+ "<select>", // Sometimes an open tag is a close tag.
+ " <option>Three</option>", // Option tags outside
selects ignored
+ "</select>",
+ "</select>",
+ "<optgroup><option>Four</option></optgroup>",
+ "</form>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-15+8",
+ " Tag : form 1+1-15+8",
+ " Text : \n 1+7-2+1",
+ " Tag : textarea 2+1-5+12",
+ " Text : <form action='a?b=c&d=e&f=g'>\n </form>\n 2+11-5+1",
+ " Text : \n\n 5+12-7+1",
+ " Tag : select 7+1-10+9",
+ " Text : \n 7+9-8+3",
+ " Tag : option 8+3-9+3",
+ " Text : One\n 8+11-9+3",
+ " Tag : option 9+3-9+23",
+ " Text : Two 9+11-9+14",
+ " Text : \n 9+23-10+1",
+ " Text : \n Three\n\n\nFour\n 10+9-15+1"
+ ),
+ Arrays.<String>asList(
+ "LINT testInputElements:10+1 - 9:"
+ + " 'select' start tag where end tag expected.",
+ "LINT testInputElements:11+3 - 11: Stray start tag 'option'.",
+ "LINT testInputElements:12+1 - 10: Stray end tag 'select'.",
+ "LINT testInputElements:13+1 - 10: Stray end tag 'select'.",
+ "LINT testInputElements:14+1 - 11: Stray start tag 'optgroup'.",
+ "LINT testInputElements:14+11 - 19: Stray start tag 'option'."
+ ),
+ Arrays.asList(
+ "<form>",
+ "<textarea> <form action='a?b=c&d=e&f=g'>",
+ " </form>",
+ "</textarea>",
+ "",
+ "<select>",
+ " <option>One",
+ " </option><option>Two</option>",
+ "</select>",
+ " Three",
+ "",
+ "",
+ "Four",
+ "</form>"
+ )
+ );
+ }
+
+ public void testUnknownTagsNest() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html><foo><bar>baz</bar></baz>boo<command></html>"
+ ),
+ Arrays.asList(
+ "Tag : html 1+1-1+51",
+ " Tag : head 1+7-1+7",
+ " Tag : body 1+7-1+44",
+ " Tag : foo 1+7-1+26",
+ " Tag : bar 1+12-1+26",
+ " Text : baz 1+17-1+20",
+ " Text : boo 1+32-1+35",
+ " Tag : command 1+35-1+51"
+ ),
+ Arrays.asList(
+ "LINT testUnknownTagsNest:1+26 - 32: Unclosed element 'foo'.",
+ "LINT testUnknownTagsNest:1+44 - 51:"
+ + " End tag for 'html' seen but there were unclosed elements."
+ ),
+ Arrays.asList(
+ ("<html><head></head><body>"
+ + "<foo><bar>baz</bar></foo>boo<command></command></body></html>")
+ )
+ );
+ }
+
+ public void testRegularTags() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<span><span>Foo</span>Bar</span></span>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-1+40",
+ " Tag : span 1+1-1+33",
+ " Tag : span 1+7-1+23",
+ " Text : Foo 1+13-1+16",
+ " Text : Bar 1+23-1+26"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<span><span>Foo</span>Bar</span>"
+ )
+ );
+ }
+
+ public void testXmp() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<p><b>Foo</b><xmp><b>Foo</b></xmp><b>Foo</b></p>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-1+49",
+ " Tag : p 1+1-1+49",
+ " Tag : b 1+4-1+14",
+ " Text : Foo 1+7-1+10",
+ " Tag : xmp 1+14-1+35",
+ " Text : <b>Foo</b> 1+19-1+29",
+ " Tag : b 1+35-1+45",
+ " Text : Foo 1+38-1+41"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<p><b>Foo</b><xmp><b>Foo</b></xmp><b>Foo</b></p>"
+ )
+ );
+ }
+
+ public void testColumnGroups() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<table><caption>Test</caption>",
+ "<colgroup>",
+ " <col class=red>",
+ " <col class=green>",
+ " <col class=blue>",
+ "</colgroup>",
+ "<tr><th>red<th>green<th>blue",
+ "</table>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-8+9",
+ " Tag : table 1+1-8+9",
+ " Tag : caption 1+8-1+31",
+ " Text : Test 1+17-1+21",
+ " Text : \n 1+31-2+1",
+ " Tag : colgroup 2+1-6+12",
+ " Text : \n 2+11-3+3",
+ " Tag : col 3+3-3+18",
+ " Attrib : class 3+8-3+13",
+ " Value : red 3+14-3+17",
+ " Text : \n 3+18-4+3",
+ " Tag : col 4+3-4+20",
+ " Attrib : class 4+8-4+13",
+ " Value : green 4+14-4+19",
+ " Text : \n 4+20-5+3",
+ " Tag : col 5+3-5+19",
+ " Attrib : class 5+8-5+13",
+ " Value : blue 5+14-5+18",
+ " Text : \n 5+19-6+1",
+ " Text : \n 6+12-7+1",
+ " Tag : tbody 7+1-8+1",
+ " Tag : tr 7+1-8+1",
+ " Tag : th 7+5-7+12",
+ " Text : red 7+9-7+12",
+ " Tag : th 7+12-7+21",
+ " Text : green 7+16-7+21",
+ " Tag : th 7+21-8+1",
+ " Text : blue\n 7+25-8+1"
+ ),
+ Arrays.<String>asList(
+ ),
+ Arrays.asList(
+ "<table><caption>Test</caption>",
+ "<colgroup>",
+ " <col class=\"red\" />",
+ " <col class=\"green\" />",
+ " <col class=\"blue\" />",
+ "</colgroup>",
+ "<tbody><tr><th>red</th><th>green</th><th>blue",
+ "</th></tr></tbody></table>"
+ )
+ );
+ }
+
+ public void testMalformedTables() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<div><table>",
+ " <col class=red>",
+ " <col class=green>",
+ " <col class=blue>",
+ " <tfoot>",
+ " <tr><th>red<th>green<th>blue",
+ " </tbody></tfoot>", // tbody does not close tfoot
+ " <table>", // Opens a new table
+ "</colgroup></table>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-9+20",
+ " Tag : div 1+1-9+20",
+ " Tag : table 1+6-8+3",
+ " Text : \n 1+13-2+3",
+ " Tag : colgroup 2+3-5+3",
+ " Tag : col 2+3-2+18",
+ " Attrib : class 2+8-2+13",
+ " Value : red 2+14-2+17",
+ " Text : \n 2+18-3+3",
+ " Tag : col 3+3-3+20",
+ " Attrib : class 3+8-3+13",
+ " Value : green 3+14-3+19",
+ " Text : \n 3+20-4+3",
+ " Tag : col 4+3-4+19",
+ " Attrib : class 4+8-4+13",
+ " Value : blue 4+14-4+18",
+ " Text : \n 4+19-5+3",
+ " Tag : tfoot 5+3-7+19",
+ " Text : \n 5+10-6+5",
+ " Tag : tr 6+5-7+11",
+ " Tag : th 6+9-6+16",
+ " Text : red 6+13-6+16",
+ " Tag : th 6+16-6+25",
+ " Text : green 6+20-6+25",
+ " Tag : th 6+25-7+11",
+ " Text : blue\n 6+29-7+3",
+ " Text : \n 7+19-8+3",
+ " Tag : table 8+3-9+20",
+ " Text : \n 8+10-9+1"
+ ),
+ Arrays.asList(
+ "LINT testMalformedTables:7+3 - 11: Stray end tag 'tbody'.",
+ "LINT testMalformedTables:8+3 - 10: Start tag for 'table'"
+ + " seen but the previous 'table' is still open.",
+ "LINT testMalformedTables:9+1 - 12: Stray end tag 'colgroup'.",
+ "LINT testMalformedTables:9+20: End of file seen and there were"
+ + " open elements."),
+ Arrays.asList(
+ "<div><table>",
+ " <colgroup><col class=\"red\" />",
+ " <col class=\"green\" />",
+ " <col class=\"blue\" />",
+ " </colgroup><tfoot>",
+ " <tr><th>red</th><th>green</th><th>blue",
+ " </th></tr></tfoot>",
+ " </table><table>",
+ "</table></div>"
+ )
+ );
+ }
+
+ public void testMoreTables() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<div>",
+ "<p><table>",
+ "<p>Foo</p>",
+ "<tr>",
+ "<caption><tr>",
+ "</table>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-6+9",
+ " Tag : div 1+1-6+9",
+ " Text : \n 1+6-2+1",
+ " Tag : p 2+1-2+4",
+ " Tag : p 3+1-3+11",
+ " Text : Foo 3+4-3+7",
+ " Tag : table 2+4-6+9",
+ " Text : \n\n 2+11-4+1",
+ " Tag : tbody 4+1-5+1",
+ " Tag : tr 4+1-5+1",
+ " Text : \n 4+5-5+1",
+ " Tag : caption 5+1-5+10",
+ " Tag : tbody 5+10-6+1",
+ " Tag : tr 5+10-6+1",
+ " Text : \n 5+14-6+1"
+ ),
+ Arrays.asList(
+ "LINT testMoreTables:3+1 - 4: Start tag 'p' seen in 'table'.",
+ "LINT testMoreTables:3+7 - 11: Stray end tag 'p'.",
+ "LINT testMoreTables:5+10 - 14:"
+ + " Stray 'tr' start tag in 'caption'.",
+ "LINT testMoreTables:6+9:"
+ + " End of file seen and there were open elements."
+ ),
+ Arrays.asList(
+ "<div>",
+ "<p></p><p>Foo</p><table>",
+ "",
+ "<tbody><tr>",
+ "</tr></tbody><caption></caption><tbody><tr>",
+ "</tr></tbody></table></div>"
+ )
+ );
+ }
+
+ public void testEvenMoreTables() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<table><colgroup><col></col><caption></colgroup></caption>",
+ "</thead></table>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-2+17",
+ " Tag : table 1+1-2+17",
+ " Tag : colgroup 1+8-1+29",
+ " Tag : col 1+18-1+23",
+ " Tag : caption 1+29-1+59",
+ " Text : \n 1+59-2+1"
+ ),
+ Arrays.asList(
+ "LINT testEvenMoreTables:1+23 - 29: Stray end tag 'col'.",
+ "LINT testEvenMoreTables:1+38 - 49: Stray end tag 'colgroup'.",
+ "LINT testEvenMoreTables:2+1 - 9: Stray end tag 'thead'."
+ ),
+ Arrays.asList(
+ "<table><colgroup><col /></colgroup><caption></caption>",
+ "</table>"
+ )
+ );
+ }
+
+ public void testTablesBonus() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<div><table><tbody></body></br><tr></td></tr></table></div>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-1+60",
+ " Tag : div 1+1-1+60",
+ " Tag : br 1+27-1+32",
+ " Tag : table 1+6-1+54",
+ " Tag : tbody 1+13-1+46",
+ " Tag : tr 1+32-1+46"
+ ),
+ Arrays.<String>asList(
+ "LINT testTablesBonus:1+20 - 27: Stray end tag 'body'.",
+ "LINT testTablesBonus:1+27 - 32: Stray end tag 'br'.",
+ "LINT testTablesBonus:1+27 - 32: End tag 'br'.",
+ "LINT testTablesBonus:1+36 - 41: Stray end tag 'td'."
+ ),
+ Arrays.asList(
+ "<div><br /><table><tbody><tr></tr></tbody></table></div>"
+ )
+ );
+ }
+
+ public void testTableRows() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList( // table does not end because none of these
close the th
+ "<div><table><tr><hr><td></th></html><select></td></table></div>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-1+64",
+ " Tag : div 1+1-1+64",
+ " Tag : hr 1+17-1+21",
+ " Tag : table 1+6-1+64",
+ " Tag : tbody 1+13-1+64",
+ " Tag : tr 1+13-1+64",
+ " Tag : td 1+21-1+64",
+ " Tag : select 1+37-1+64"
+ ),
+ Arrays.<String>asList(
+ "LINT testTableRows:1+17 - 21: Start tag 'hr' seen in 'table'.",
+ "LINT testTableRows:1+25 - 30: Stray end tag 'th'.",
+ "LINT testTableRows:1+30 - 37: Stray end tag 'html'.",
+ "LINT testTableRows:1+45 - 50: Stray end tag 'td'",
+ "LINT testTableRows:1+50 - 58: Stray end tag 'table'",
+ "LINT testTableRows:1+58 - 64: Stray end tag 'div'",
+ "LINT testTableRows:1+64:"
+ + " End of file seen and there were open elements."
+ ),
+ Arrays.asList(
+ ("<div><hr /><table><tbody><tr><td><select></select>"
+ + "</td></tr></tbody></table></div>")
+ )
+ );
+ }
+
+ public void testSelects() throws Exception {
+ assertParsedHtmlFragment(
+ Arrays.asList(
+ "<select>",
+ "</optgroup><optgroup>",
+ "<optgroup>",
+ "<option>1</optgroup>",
+ "</table><hr></select>"
+ ),
+ Arrays.asList(
+ "Fragment 1+1-5+22",
+ " Tag : select 1+1-5+22",
+ " Text : \n 1+9-2+1",
+ " Tag : optgroup 2+12-3+1",
+ " Text : \n 2+22-3+1",
+ " Tag : optgroup 3+1-4+21",
+ " Text : \n 3+11-4+1",
+ " Tag : option 4+1-4+10",
+ " Text : 1 4+9-4+10",
+ " Text : \n 4+21-5+1"
+ ),
+ Arrays.asList(
+ "LINT testSelects:2+1 - 12: Stray end tag 'optgroup'",
+ "LINT testSelects:5+1 - 9: Stray end tag 'table'",
+ "LINT testSelects:5+9 - 13: Stray 'hr' start tag."
+ ),
+ Arrays.asList(
+ "<select>",
+ "<optgroup>",
+ "</optgroup><optgroup>",
+ "<option>1</option></optgroup>",
+ "</select>"
+ )
+ );
+ }
+
+ public void testTrailingEndPhase() throws Exception {
+ assertParsedHtml(
+ Arrays.asList(
+ "<html></html><br>"
+ ),
+ Arrays.asList(
+ "Tag : html 1+1-1+18",
+ " Tag : head 1+7-1+7",
+ " Tag : body 1+7-1+18",
+ " Tag : br 1+14-1+18"
+ ),
+ Arrays.asList(
+ "LINT testTrailingEndPhase:1+14 - 18: Stray 'br' start tag."
+ ),
+ Arrays.asList(
+ "<html><head></head><body><br /></body></html>"
+ )
+ );
+ }
+
+ private void assertParsedHtml(
+ List<String> htmlInput,
+ List<String> expectedParseTree,
+ List<String> expectedMessages,
+ List<String> expectedOutputHtml)
+ throws IOException, ParseException {
+ assertParsedMarkup(htmlInput, expectedParseTree, expectedMessages,
+ expectedOutputHtml, false, false);
+ }
+
+ private void assertParsedHtmlFragment(
+ List<String> htmlInput,
+ List<String> expectedParseTree,
+ List<String> expectedMessages,
+ List<String> expectedOutputHtml)
+ throws IOException, ParseException {
+ assertParsedMarkup(htmlInput, expectedParseTree, expectedMessages,
+ expectedOutputHtml, false, true);
+ }
+
+ private void assertParsedMarkup(
+ List<String> htmlInput,
+ List<String> expectedParseTree,
+ List<String> expectedMessages,
+ List<String> expectedOutputHtml,
+ boolean asXml,
+ boolean fragment)
+ throws IOException, ParseException {
+
+ System.err.println("\n\nStarting " + getName() + "\n===================");
+
+ MessageQueue mq = new SimpleMessageQueue();
+ MessageContext mc = new MessageContext();
+
+ OpenElementStack elementStack
+ = (asXml
+ ? OpenElementStack.Factory.createXmlElementStack()
+ : OpenElementStack.Factory.createHtml5ElementStack(mq));
+
+ TokenQueue<HtmlTokenType> tq = tokenizeTestInput(
+ Join.join("\n", htmlInput), false);
+
+ DomTree tree = (fragment
+ ? DomParser.parseFragment(tq, elementStack)
+ : DomParser.parseDocument(tq, elementStack));
+
+ List<String> actualParseTree = new ArrayList<String>();
+ formatWithLinePositions(tree, mc, 0, new IdentityHashMap<DomTree, Void>(),
+ actualParseTree);
+ assertListsEqual(expectedParseTree, actualParseTree);
+
+ List<String> actualMessages = new ArrayList<String>();
+ for (Message message : mq.getMessages()) {
+ String messageText = (message.getMessageLevel().name() + " "
+ + message.format(mc));
+ actualMessages.add(messageText);
+ }
+ assertListsEqual(expectedMessages, actualMessages, 0);
+
+ RenderContext context = new RenderContext(mc, new StringBuilder());
+ tree.render(context);
+ List<String> outputHtml = Arrays.asList(context.out.toString().split("\n"));
+ assertListsEqual(expectedOutputHtml, outputHtml);
+ }
+
+ private TokenQueue<HtmlTokenType> tokenizeTestInput(
+ String sgmlInput, boolean asXml) {
+ InputSource is = new InputSource(URI.create("test:///" + getName()));
+
+ CharProducer cp = CharProducer.Factory.create(
+ new StringReader(sgmlInput), is);
+ HtmlLexer lexer = new HtmlLexer(cp);
+ lexer.setTreatedAsXml(asXml);
+ return new TokenQueue<HtmlTokenType>(
+ lexer, is, Criterion.Factory.<Token<HtmlTokenType>>optimist());
+ }
+
+ private void formatWithLinePositions(
+ DomTree tree, MessageContext mc, int depth,
+ IdentityHashMap<DomTree, ?> seen, List<? super String> out) {
+
+ StringBuilder sb = new StringBuilder();
+ for (int i = depth; --i >= 0;) { sb.append(" "); }
+ sb.append(tree.toString());
+ FilePosition pos = tree.getFilePosition();
+ if (pos != null) {
+ sb.append(' ')
+ .append(pos.startLineNo()).append('+').append(pos.startCharInLine())
+ .append('-')
+ .append(pos.endLineNo()).append('+').append(pos.endCharInLine());
+ }
+
+ if (seen.containsKey(tree)) {
+ sb.append(" !DUPE");
+ out.add(sb.toString());
+ return;
+ } else {
+ seen.put(tree, null);
+ out.add(sb.toString());
+ }
+
+ for (DomTree child : tree.children()) {
+ formatWithLinePositions(child, mc, depth + 1, seen, out);
+ }
}
}
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/js/ParserTest.java
==============================================================================
--- /trunk/src/javatests/com/google/caja/parser/js/ParserTest.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/js/ParserTest.java
Thu Dec 20 08:56:04 2007
@@ -178,7 +178,7 @@
Statement parseTree = TestUtil.parseTree(getClass(), testFile, mq);
TestUtil.checkFilePositionInvariants(parseTree);
- RenderContext rc = new RenderContext(mc, new StringBuilder(), true);
+ RenderContext rc = new RenderContext(mc, new StringBuilder(), paranoid);
parseTree.render(rc);
rc.newLine();
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/js/rendergolden1.txt
==============================================================================
--- /trunk/src/javatests/com/google/caja/parser/js/rendergolden1.txt (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/js/rendergolden1.txt
Thu Dec 20 08:56:04 2007
@@ -24,7 +24,7 @@
};
switch (foo()) {
case 1:
- return 'panic';
+ return "panic";
case 2:
if (a === 4) {
break;
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/quasiliteral/DefaultRewriterTest.java
==============================================================================
---
/trunk/src/javatests/com/google/caja/parser/quasiliteral/DefaultRewriterTest.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/quasiliteral/DefaultRewriterTest.java
Thu Dec 20 08:56:04 2007
@@ -19,7 +19,6 @@
import com.google.caja.lexer.JsLexer;
import com.google.caja.lexer.JsTokenQueue;
import com.google.caja.parser.ParseTreeNode;
-import com.google.caja.parser.ParseTreeNodes;
import com.google.caja.parser.js.Parser;
import com.google.caja.parser.js.Statement;
import com.google.caja.reporting.MessageContext;
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/quasiliteral/MatchTest.java
==============================================================================
---
/trunk/src/javatests/com/google/caja/parser/quasiliteral/MatchTest.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/parser/quasiliteral/MatchTest.java
Thu Dec 20 08:56:04 2007
@@ -22,7 +22,6 @@
import com.google.caja.parser.js.ExpressionStmt;
import com.google.caja.parser.js.FormalParam;
import com.google.caja.parser.js.FunctionConstructor;
-import com.google.caja.parser.js.FunctionDeclaration;
import com.google.caja.parser.js.Identifier;
import com.google.caja.parser.js.IntegerLiteral;
import com.google.caja.parser.js.Operation;
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/plugin/CompiledPluginTest.java
==============================================================================
--- /trunk/src/javatests/com/google/caja/plugin/CompiledPluginTest.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/plugin/CompiledPluginTest.java
Thu Dec 20 08:56:04 2007
@@ -40,8 +40,6 @@
import java.util.ArrayList;
import java.util.List;
-import org.mozilla.javascript.JavaScriptException;
-
import junit.framework.TestCase;
/**
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/util/JoinTest.java
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/util/JoinTest.java
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,18 @@
+// Copyright 2007 Google Inc. All Rights Reserved.
+
+package com.google.caja.util;
+
+import junit.framework.TestCase;
+
+/**
+ * @author msa...@google.com (Mike Samuel)
+ */
+public class JoinTest extends TestCase {
+ public void testJoin() {
+ assertEquals("", Join.join(""));
+ assertEquals("", Join.join("foo"));
+ assertEquals("barFOObaz", Join.join("FOO", "bar", "baz"));
+ assertEquals("barFOObazFOO", Join.join("FOO", "bar", "baz", ""));
+ assertEquals(",,,", Join.join(",", "", "", "", ""));
+ }
+}
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/util/MoreAsserts.java
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/util/MoreAsserts.java
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,142 @@
+// Copyright (C) 2007 Google Inc.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.google.caja.util;
+
+import java.util.Formatter;
+import java.util.List;
+import java.util.ListIterator;
+
+import junit.framework.Assert;
+
+/**
+ * Extensions to junit.framework.Asserts that can be statically
imported as by
+ * {@code import static com.google.caja.util.MoreAsserts.*;}.
+ *
+ * @author mikes...@gmail.com
+ */
+public final class MoreAsserts {
+
+ /**
+ * Fails iff the contents of the two lists differ according to
+ * {@code Object.equals}.
+ * Tries to present the differences as nice diffs.
+ * <p>
+ * TODO(mikesamuel): maybe actually diff using
+ * http://www.incava.org/projects/java/java-diff/
+ */
+ public static <T> void assertListsEqual(
+ List<? extends T> expected, List<? extends T> actual) {
+ assertListsEqual(expected, actual, 2);
+ }
+
+ /**
+ * Fails iff the contents of the two lists differ according to
+ * {@code Object.equals}.
+ * Tries to present the differences as nice diffs.
+ * <p>
+ * TODO(mikesamuel): maybe actually diff using
+ * http://www.incava.org/projects/java/java-diff/
+ *
+ * @param diffContext the number of extra lines to show if there are errors.
+ */
+ public static <T> void assertListsEqual(
+ List<? extends T> expected, List<? extends T> actual, int
diffContext) {
+ int m = expected.size();
+ int n = actual.size();
+
+ int commonPrefix = 0;
+ {
+ ListIterator<? extends T> i = expected.listIterator();
+ ListIterator<? extends T> j = actual.listIterator();
+
+ while (i.hasNext() && j.hasNext() && areEqual(i.next(),
j.next())) {
+ ++commonPrefix;
+ }
+ }
+
+ if (commonPrefix == Math.max(m, n)) {
+ // All are equal
+ return;
+ }
+
+ int commonSuffix = 0;
+ if (commonPrefix != Math.min(m, n)) {
+ ListIterator<? extends T> i = expected.listIterator(m);
+ ListIterator<? extends T> j = actual.listIterator(n);
+
+ int max = Math.min(m, n) - commonPrefix;
+ while (commonSuffix < max && i.hasPrevious() && j.hasPrevious()
+ && areEqual(i.previous(), j.previous())) {
+ ++commonSuffix;
+ }
+ }
+
+ Assert.fail(
+ "Expected: {{{\n"
+ + snippet(expected,
+ Math.max(commonPrefix - diffContext, 0),
+ Math.min(m, m - commonSuffix + diffContext))
+ + "\n}}} != {{{\n"
+ + snippet(actual,
+ Math.max(commonPrefix - diffContext, 0),
+ Math.min(n, n - commonSuffix + diffContext))
+ + "\n}}}");
+ }
+
+ private static String snippet(List<?> a, int start, int end) {
+ StringBuilder sb = new StringBuilder();
+ if (start != 0) {
+ sb.append("\t...");
+ }
+
+ Formatter f = new Formatter(sb);
+ int index = start;
+ for (Object item : a.subList(start, end)) {
+ if (sb.length() != 0) { sb.append('\n'); }
+ if (item != null) {
+ String type = item.getClass().getSimpleName();
+ f.format("\t%3d %s: %s", Integer.valueOf(index),
+ abbreviatedString("" + item, 204 - type.length()), type);
+ } else {
+ f.format("\t%3d <null>", Integer.valueOf(index));
+ }
+ ++index;
+ }
+ if (end < a.size()) {
+ if (sb.length() != 0) { sb.append('\n'); }
+ sb.append("\t...");
+ }
+ return sb.toString();
+ }
+
+ private static String abbreviatedString(String s, int maxLen) {
+ s = s.replace("\\", "\\\\")
+ .replace("\n", "\\n")
+ .replace("\r", "\\r");
+ if (s.length() > maxLen) {
+ System.err.println("<<<" + s + ">>>");
+ int headLen = (maxLen - 3) / 2;
+ int tailLen = maxLen - 3 - headLen;
+ s = s.substring(0, headLen) + "..." + s.substring(s.length() - tailLen);
+ }
+ return "`" + s + "`";
+ }
+
+ private static boolean areEqual(Object a, Object b) {
+ return a != null ? a.equals(b) : b == null;
+ }
+
+ private MoreAsserts() { /* not instantiable */ }
+}
Modified: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/util/RhinoTestBed.java
==============================================================================
--- /trunk/src/javatests/com/google/caja/util/RhinoTestBed.java (original)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/javatests/com/google/caja/util/RhinoTestBed.java
Thu Dec 20 08:56:04 2007
@@ -26,6 +26,8 @@
import java.io.StringWriter;
import java.io.Writer;
+import junit.framework.Assert;
+
import org.mozilla.javascript.Context;
import org.mozilla.javascript.ScriptableObject;
@@ -67,8 +69,7 @@
}
return result;
} catch (org.mozilla.javascript.JavaScriptException e) {
- junit.framework.TestCase.fail(e.details() + "\n"
- + e.getScriptStackTrace());
+ Assert.fail(e.details() + "\n" + e.getScriptStackTrace());
return null;
} finally {
Context.exit();
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/third_party/java/htmlparser/LICENSE.txt
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/third_party/java/htmlparser/LICENSE.txt
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,27 @@
+This is for the HTML parser as a whole. For the copyright notices for
+individual files, please see individual files.
+
+/*
+ * Copyright (c) 2005, 2006, 2007 Henri Sivonen
+ * Copyright (c) 2007 Mozilla Foundation
+ * Portions of comments Copyright 2004-2007 Apple Computer, Inc., Mozilla
+ * Foundation, and Opera Software ASA.
+ *
+ * Permission is hereby granted, free of charge, to any person
obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
\ No newline at end of file
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/third_party/java/htmlparser/README.txt
==============================================================================
--- (empty file)
+++
changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/third_party/java/htmlparser/README.txt
Thu Dec 20 08:56:04 2007
@@ -0,0 +1,3 @@
+An HTML5 parser.
+
+-- Henri Sivonen (hsiv...@iki.fi).
Added: changes/mikesamuel/malformed-html-20-Dec-2007/trunk/src/third_party/java/htmlparser/htmlparser.jar
==============================================================================
Binary file. No diff available.
== java/com/google/caja/lexer/HtmlLexer.java ==
234: Pull this line up to previous.
== java/com/google/caja/lexer/HtmlTextEscapingMode.java ==
69: Why are these tags "exempt" from escaping? It seems this map
describes the specific escaping *mode* of the tag, not that it's
exempt from escaping entirely. I was confused.
124: Rename to "isTag..."?
== java/com/google/caja/lexer/InputElementJoiner.java ==
lgtm
== java/com/google/caja/lexer/Token.java ==
lgtm
== java/com/google/caja/opensocial/DefaultGadgetRewriter.java ==
lgtm
== java/com/google/caja/parser/AbstractParseTreeNode.java ==
lgtm
== java/com/google/caja/parser/css/CssPropertySignature.java ==
lgtm
== java/com/google/caja/parser/html/OpenElementStack.java ==
61,72: The canonicalize*() methods could probably be pushed down into
AbstractElementStack and the one lone external use of
canonicalizeAttributeName() refactored into the *ElementStack.
98: Why does open() not require a FilePosition, like close()?
== java/com/google/caja/parser/html/AbstractElementStack.java ==
lgtm
== java/com/google/caja/parser/html/XmlElementStack.java ==
lgtm
== java/com/google/caja/parser/html/Html5ElementStack.java ==
136: I'm confused that this method returns different "levels" of
structure depending on whether there is "extra" information in what
comes in, i.e., the branch starting at 173 vs. that starting at 179.
An example of input/output for both cases might help visualize what is
really going on here.
218: Comment starting here has inconsistent quoting in the {@code ...} sections.
197: +newline
273: Can this class be pulled out to the top level? Top-level classes
allow the structure of the code (and diffs thereof) to be more
browse-able with the filesystem.
273: Methods of this class should be public/private/protected/... --
or use package private iff that's actually what you mean.
327: Placeholder for issue we discussed, where I was initially
confused about why you were creating a 0-length token here.
527: Move this declaration to class header.
525: Perhaps include a reference to closeUnclosedNodes() in the comment.
557,562: Can use body.value() and html.value() instead of literal strings.
535: I was confused -- why is "select" special here?
540: I was confused about why it's not FilePosition.endOf(...).
596: The name "up" meant, to me -- well -- up! How about "np"?
== java/com/google/caja/parser/html/DomParserMessageType.java ==
lgtm
== java/com/google/caja/parser/html/IllegalDocumentStateException.java ==
lgtm
== java/com/google/caja/parser/html/DomParser.java ==
73: Unprotected debug println.
102: I had a hard time parsing this comment. What do you mean,
whitespace outside the root element is not significant? Which root
element? You just said whitespace is significant for XML except where
specified by the schema.....
== java/com/google/caja/parser/html/DomTree.java ==
lgtm
== java/com/google/caja/plugin/ExpressionSanitizerCaja.java ==
lgtm
== java/com/google/caja/plugin/GxpCompiler.java ==
lgtm
== java/com/google/caja/plugin/HtmlPluginCompiler.java ==
lgtm
== java/com/google/caja/plugin/HtmlPluginCompilerMain.java ==
lgtm
Fwiw, we should all think about when to bobbitt non-quasiliteral JS
sanitization now that the quasi stuff seems to be working okay.
== java/com/google/caja/util/Join.java ==
lgtm
== javatests/com/google/caja/lexer/htmllexerinput1.html ==
lgtm
== javatests/com/google/caja/lexer/htmllexergolden1.txt ==
lgtm
== javatests/com/google/caja/lexer/htmllexerinput2.xml ==
lgtm
== javatests/com/google/caja/lexer/htmllexergolden2.txt ==
lgtm
== javatests/com/google/caja/lexer/HtmlLexerTest.java ==
lgtm
== javatests/com/google/caja/opensocial/example-rewritten.xml ==
lgtm
== javatests/com/google/caja/parser/css/CssParserTest.java ==
lgtm
== javatests/com/google/caja/parser/html/DomParserTest.java ==
*: In general, I'm wondering about the detailed golden output bent of
the tests in this class. Did you write out all the golden outputs by
hand, or did you run the parser, check that all is okay, then snapshot
the outputs for posterity? In any case, it seems that just having
"initial input" HTML, and "final rendered" HTML, pegging the ends of
the chain as it were, would be adequate since it would allow the stuff
in between to be refactored without too much pain. On the other hand,
you are after all way more familiar than I am with the implications of
the inter-relationships between the HTML parsing rules. What do you
think?
724: Like take this for example. You can test that the <p> is not
incorrectly nested without having to verify the intricate details of
all the intermediate pieces.
374: Just to be sure if I understand: is this test exercising the
alternate paths in Html5ElementStack.java, line 173 vs. 179, that I
asked about earlier?
1038: Wired spelling of wierd. (Not inappropriately, given the subject matter.)
== javatests/com/google/caja/parser/js/ParserTest.java ==
lgtm
== javatests/com/google/caja/parser/js/rendergolden1.txt ==
lgtm
== javatests/com/google/caja/parser/quasiliteral/DefaultRewriterTest.java ==
lgtm
== javatests/com/google/caja/parser/quasiliteral/MatchTest.java ==
lgtm
== javatests/com/google/caja/plugin/CompiledPluginTest.java ==
lgtm
== javatests/com/google/caja/util/JoinTest.java ==
lgtm
== javatests/com/google/caja/util/MoreAsserts.java ==
129: Do you really mean to dribble to System.err here? Is the idea so
you can have some record of the full string, but be able to put the
abbreviated one in error msgs, etc?
== javatests/com/google/caja/AllTests.java ==
lgtm
== javatests/com/google/caja/util/RhinoTestBed.java ==
lgtm
Ihab
== java/com/google/caja/lexer/CssLexer.java ==
lgtm
== java/com/google/caja/lexer/HtmlLexer.java ==
234: Pull this line up to previous.
== java/com/google/caja/lexer/HtmlTextEscapingMode.java ==
69: Why are these tags "exempt" from escaping? It seems this map
describes the specific escaping *mode* of the tag, not that it's
exempt from escaping entirely. I was confused.
124: Rename to "isTag..."?
== java/com/google/caja/lexer/InputElementJoiner.java ==
lgtm
== java/com/google/caja/lexer/Token.java ==
lgtm
== java/com/google/caja/opensocial/DefaultGadgetRewriter.java ==
lgtm
== java/com/google/caja/parser/AbstractParseTreeNode.java ==
lgtm
== java/com/google/caja/parser/css/CssPropertySignature.java ==
lgtm
== java/com/google/caja/parser/html/OpenElementStack.java ==
61,72: The canonicalize*() methods could probably be pushed down into
AbstractElementStack and the one lone external use of
canonicalizeAttributeName() refactored into the *ElementStack.
98: Why does open() not require a FilePosition, like close()?
== java/com/google/caja/parser/html/AbstractElementStack.java ==
lgtm
== java/com/google/caja/parser/html/XmlElementStack.java ==
lgtm
== java/com/google/caja/parser/html/Html5ElementStack.java ==
136: I'm confused that this method returns different "levels" of
structure depending on whether there is "extra" information in what
comes in, i.e., the branch starting at 173 vs. that starting at 179.
An example of input/output for both cases might help visualize what is
really going on here.
218: Comment starting here has inconsistent quoting in the {@code ...} sections.
197: +newline
273: Can this class be pulled out to the top level? Top-level classes
allow the structure of the code (and diffs thereof) to be more
browse-able with the filesystem.
273: Methods of this class should be public/private/protected/... --
or use package private iff that's actually what you mean.
327: Placeholder for issue we discussed, where I was initially
confused about why you were creating a 0-length token here.
527: Move this declaration to class header.
525: Perhaps include a reference to closeUnclosedNodes() in the comment.
557,562: Can use body.value() and html.value() instead of literal strings.
535: I was confused -- why is "select" special here?
Handle the token as follows:
Parse error. Act as if the token had been an end tag with the tag name "select" instead.
540: I was confused about why it's not FilePosition.endOf(...).
596: The name "up" meant, to me -- well -- up! How about "np"?
== java/com/google/caja/parser/html/DomParserMessageType.java ==
lgtm
== java/com/google/caja/parser/html/IllegalDocumentStateException.java ==
lgtm
== java/com/google/caja/parser/html/DomParser.java ==
73: Unprotected debug println.
102: I had a hard time parsing this comment. What do you mean,
whitespace outside the root element is not significant? Which root
element? You just said whitespace is significant for XML except where
specified by the schema.....
== java/com/google/caja/parser/html/DomTree.java ==
lgtm
== java/com/google/caja/plugin/ExpressionSanitizerCaja.java ==
lgtm
== java/com/google/caja/plugin/GxpCompiler.java ==
lgtm
== java/com/google/caja/plugin/HtmlPluginCompiler.java ==
lgtm
== java/com/google/caja/plugin/HtmlPluginCompilerMain.java ==
lgtm
Fwiw, we should all think about when to bobbitt non-quasiliteral JS
sanitization now that the quasi stuff seems to be working okay.
724: Like take this for example. You can test that the <p> is not
incorrectly nested without having to verify the intricate details of
all the intermediate pieces.
374: Just to be sure if I understand: is this test exercising the
alternate paths in Html5ElementStack.java, line 173 vs. 179, that I
asked about earlier?
1038: Wired spelling of wierd. (Not inappropriately, given the subject matter.)
== javatests/com/google/caja/parser/js/ParserTest.java ==
lgtm
== javatests/com/google/caja/parser/js/rendergolden1.txt ==
lgtm
== javatests/com/google/caja/parser/quasiliteral/DefaultRewriterTest.java ==
lgtm
== javatests/com/google/caja/parser/quasiliteral/MatchTest.java ==
lgtm
== javatests/com/google/caja/plugin/CompiledPluginTest.java ==
lgtm
== javatests/com/google/caja/util/JoinTest.java ==
lgtm
== javatests/com/google/caja/util/MoreAsserts.java ==
129: Do you really mean to dribble to System.err here? Is the idea so
you can have some record of the full string, but be able to put the
abbreviated one in error msgs, etc?