Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

exception in html parser under Linux

0 views
Skip to first unread message

unplug

unread,
Jun 19, 2001, 3:01:02 AM6/19/01
to
Hi all,

Following code is copied from Tech Tip 23Sep1999. I have compiled it and
run it under Win98. It works fine for any uri. However, when I try to
run it under Linux, it throws exceptions. I noticed that some web site
can be parsered with the program in Linux but some can't. I wonder the
different between those platforms. Anyone can tell me how to make the
program works under Linux.

Rgds,
unplug

configuration
RedHat 7.1
JDK1.3.1
Failed: java GetLinks http://java.sun.com
Worked: java GetLinks http://www.apache.org

--begining of code
import java.io.*;
import java.net.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
class GetLinks {
public static void main(String[] args) {
EditorKit kit = new HTMLEditorKit();
Document doc = kit.createDefaultDocument();
// The Document class does not yet
// handle charset's properly.
doc.putProperty("IgnoreCharsetDirective",
Boolean.TRUE);
try {

// Create a reader on the HTML content.
Reader rd = getReader(args[0]);

// Parse the HTML.
kit.read(rd, doc, 0);

// Iterate through the elements
// of the HTML document.
ElementIterator it = new ElementIterator(doc);
javax.swing.text.Element elem;
while ((elem = it.next()) != null) {
SimpleAttributeSet s = (SimpleAttributeSet)
elem.getAttributes().getAttribute(HTML.Tag.A);
if (s != null) {
System.out.println(
s.getAttribute(HTML.Attribute.HREF));
}
}
} catch (Exception e) {
e.printStackTrace();
}
System.exit(1);
}

// Returns a reader on the HTML data. If 'uri' begins
// with "http:", it's treated as a URL; otherwise,
// it's assumed to be a local filename.
static Reader getReader(String uri)
throws IOException {
if (uri.startsWith("http:")) {

// Retrieve from Internet.
URLConnection conn=
new URL(uri).openConnection();
return new
InputStreamReader(conn.getInputStream());
} else {
// Retrieve from file.
return new FileReader(uri);
}
}
--End of code
--Exception in Linux
Exception in thread "main" java.lang.NoClassDefFoundError
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:120)
at java.awt.Toolkit$2.run(Toolkit.java:512)
at java.security.AccessController.doPrivileged(Native Method)
at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:503)
at
javax.swing.text.html.CSS.getValidFontNameMapping(CSS.java:932)
at
javax.swing.text.html.CSS$FontFamily.parseCssValue(CSS.java:1789)
at javax.swing.text.html.CSS.getInternalCSSValue(CSS.java:531)
at javax.swing.text.html.CSS.addInternalCSSValue(CSS.java:516)
at
javax.swing.text.html.StyleSheet.addCSSAttribute(StyleSheet.java:436)
at
javax.swing.text.html.HTMLDocument$HTMLReader$ConvertAction.start(HTM
LDocument.java:2536)
at
javax.swing.text.html.HTMLDocument$HTMLReader.handleStartTag(HTMLDocu
ment.java:1992)
at
javax.swing.text.html.parser.DocumentParser.handleStartTag(DocumentPa
rser.java:145)
at javax.swing.text.html.parser.Parser.startTag(Parser.java:333)
at
javax.swing.text.html.parser.Parser.parseTag(Parser.java:1786)
at
javax.swing.text.html.parser.Parser.parseContent(Parser.java:1821)
at javax.swing.text.html.parser.Parser.parse(Parser.java:1980)
at
javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java
:109)
at
javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.ja
va:74)
at
javax.swing.text.html.HTMLEditorKit.read(HTMLEditorKit.java:239)
at GetLinks.main(GetLinks.java:23)
---
Posted via freenews.netfront.net
Complaints to ne...@netfront.net

Thye Chean

unread,
Jul 12, 2001, 4:29:54 AM7/12/01
to
Hi I found that on Linux, it will crash with the error like what you
describe if the HTML specified font, like <font face="arial..."> etc.

I have to filter away all the fonts before passing the HTML file to
the Swing parser.

Thye Chean

unplug <unp...@poboxes.com> wrote in message news:<3B2EF8AE...@poboxes.com>...

0 new messages