Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

RuntimeException trying to clean "bad tags" from HTML in JEditorPane

3 views
Skip to first unread message

Phil Powell

unread,
Apr 27, 2011, 11:31:45 AM4/27/11
to
To: comp.lang.java.gui
According to http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4695909
the problem with the inconsistent ArrayIndexOutOfBoundsException that
I have been getting whenever I would go to a particular URL to put
into JEditorPane using setPage() - this may also be due to JEditorPane
containing HTML which contains a <META> tag and/or <!-- comment tag --
>. I created a class SimpleHTMLRenderableEditorPane which extends
JEditorPane, based upon code I found in the aformentioned bug link
which should auto-strip out "bad tags":

[code]
/*
* SimpleHTMLRenderableEditorPane.java
*
* Created on March 13, 2007, 3:39 PM
*
* To change this template, choose Tools | Template Manager
* and open the template in the editor.
*/

package com.ppowell.tools.ObjectTools.SwingTools;

import javax.swing.JEditorPane;

/**
* A safer version of {@link javax.swing.JEditorPane}
* @author Phil Powell
* @version JDK 1.6.0
*/
public class SimpleHTMLRenderableEditorPane extends JEditorPane {

//--------------------------- --* CONSTRUCTORS *--
---------------------------
// <editor-fold defaultstate="collapsed" desc=" Constructors
">
/** Creates a new instance of SimpleHTMLRenderableEditorPane */
public SimpleHTMLRenderableEditorPane() {
super();
}
// </editor-fold>
//----------------------- --* GETTER/SETTER METHODS *--
----------------------
// <editor-fold defaultstate="collapsed" desc=" Getter/Setter
Methods ">
/**
* Overloaded to fix HTML rendering bug Bug ID: 4695909.
* @param text
*/
public void setText(String text) {
// Workaround for bug Bug ID: 4695909 in java 1.4
// JEditorPane does not handle the META tag in the html HEAD
if (isJava14() && "text/
html".equalsIgnoreCase(getContentType())) {
text = stripMetaTag(text);
}
super.setText(text);
}
// </editor-fold>
//--------------------------- --* OTHER METHODS *--
--------------------------
// <editor-fold defaultstate="collapsed" desc=" Methods ">
/**
* Clean HTML to remove things like <script>, <style>, <object>,
<embed>, and <!-- -->
* Based upon <a href="http://bugs.sun.com/bugdatabase/view_bug.do?
bug_id=4695909">bug report</a>
*/
public void cleanHTML() {
try {
String html = getText();
if (html != null && !html.equals("")) {
String[] patternArray = {
"<script[^>]*>[^<]*(</script>)?",
"<style[^>]*>[^<]*(</style>)?>",
"<object[^>]*>[^<]*(</object>)?>",
"<embed[^>]*>[^<]*(</embed>)?>",
"<!\\-\\-.*\\-\\->"
};
for (int i = 0; i < patternArray.length; i++) {
html = html.replaceAll(patternArray[i], "");
}
setText(html);
}
} catch (Exception e) {} // DO NOTHING
}

/**
* Determine if java version is 1.4.
* @return true if java version is 1.4.x....
*/
private boolean isJava14() {
String version = System.getProperty("java.version");
return version.startsWith("1.4");
}

/**
* Workaround for Bug ID: 4695909 in java 1.4, fixed in 1.5
* JEditorPane fails to display HTML BODY when META
* tag included in HEAD section.
*
* <html>
* <head>
* <META http-equiv="Content-Type" content="text/html;
charset=UTF-8">
* </head>
* <body>
*
* @param text html to strip.
* @return same HTML text w/o the META tag.
*/
private String stripMetaTag(String text) {
// String used for searching, comparison and indexing
String textUpperCase = text.toUpperCase();

int indexHead = textUpperCase.indexOf("<META ");
int indexMeta = textUpperCase.indexOf("<META ");
int indexBody = textUpperCase.indexOf("<BODY ");

// Not found or meta not inside the head nothing to strip...
if (indexMeta == -1 || indexMeta > indexHead && indexMeta <
indexBody) {
return text;
}

// Find end of meta tag text.
int indexHeadEnd = textUpperCase.indexOf(">", indexMeta);

// Strip meta tag text
return text.substring(0, indexMeta-1) +
text.substring(indexHeadEnd+1);
}
// </editor-fold>
}

However, upon running the following line:

[code]
SimpleBrowser.this.browser.cleanHTML();
[/code]

I spawn the following exception:

[code]
Exception in thread "Thread-2" java.lang.RuntimeException: Must insert
new content into body element-
at javax.swing.text.html.HTMLDocument
$HTMLReader.generateEndsSpecsForMidInsert(HTMLDocument.java:1961)
at javax.swing.text.html.HTMLDocument
$HTMLReader.<init>(HTMLDocument.java:1908)
at javax.swing.text.html.HTMLDocument
$HTMLReader.<init>(HTMLDocument.java:1782)
at javax.swing.text.html.HTMLDocument
$HTMLReader.<init>(HTMLDocument.java:1777)
at
javax.swing.text.html.HTMLDocument.getReader(HTMLDocument.java:137)
at javax.swing.text.html.HTMLEditorKit.read(HTMLEditorKit.java:
228)
at javax.swing.JEditorPane.read(JEditorPane.java:556)
at javax.swing.JEditorPane$PageLoader.run(JEditorPane.java:
647)

[/code]

What should I do at this point?

Thanks
Phil

---
* Synchronet * The Whitehouse BBS --- whitehouse.hulds.com --- check it out free usenet!
--- Synchronet 3.15a-Win32 NewsLink 1.92
Time Warp of the Future BBS - telnet://time.synchro.net:24

0 new messages