Above is the start of a HTML file I am trying to parse using Java's
HTMLEditorKit and when I run the program I get the following output:
At position:64 we have the comment: saved from
url=(0038)http://www.mydigiguide.com/dgx/wbl.dll
At position: 134 We have start Tag: html
At position: 140 We have start Tag: head
At position: 146 We have start Tag: title
At position: 153 We have text: DigiGuide: The Best TV Guide - myDigiGuide
Online Listings
At position: 211 We have end Tag: title
At position: 294 We have error msg: req.att contentmeta?
At position: 221 We have end Tag: head
At position: 221 We have end Tag: html
At position: 221 We have start Tag: html
Attribute: _implied_
At position: 221 We have start Tag: head
Attribute: _implied_
At position: 221 We have end Tag: head
At position: 221 We have start Tag: body
Attribute: _implied_
At position: 294 We have error msg: ioexception???
At position: 221 We have end Tag: body
At position: 221 We have end Tag: html
IO Exception
Can anyone tell me why I am getting this exception or how I can fix it.
Thank you
If you need any further info or the code please ask
James Gralton <jim...@yahoo.com> wrote:
[...]
> At position: 134 We have start Tag: html
> At position: 140 We have start Tag: head
> At position: 146 We have start Tag: title
> At position: 153 We have text: DigiGuide: The Best TV Guide - myDigiGuide
> Online Listings
> At position: 211 We have end Tag: title
> At position: 294 We have error msg: req.att contentmeta?
> At position: 221 We have end Tag: head
> At position: 221 We have end Tag: html
> At position: 221 We have start Tag: html
Now this looks as if the HTML is severely broken.
Please post the complete (or a shortened version that still shows)
error) HTML document.
> Attribute: _implied_
> At position: 221 We have start Tag: head
> Attribute: _implied_
> At position: 221 We have end Tag: head
> At position: 221 We have start Tag: body
> Attribute: _implied_
> At position: 294 We have error msg: ioexception???
What is position "294"? Where does the Reader come from?
What is the IOException's stack trace?
The HTML parser also gets quite confused if any of the handleXXX methods
throw (unchecked) exceptions.
Christian
James Gralton <jim...@yahoo.com> wrote in message news:...Here is the error with the stack trace:
C:\jbuilder5\jdk1.3\bin\javaw -classpath "C:\Documents and Settings\James
Gralton\My
Documents\Work\Project\htmlEditor\htmlEditor\classes;C:\jbuilder5\jdk1.3\dem
o\jfc\Java2D\Java2Demo.jar;C:\jbuilder5\jdk1.3\jre\lib\i18n.jar;C:\jbuilder5
\jdk1.3\jre\lib\jaws.jar;C:\jbuilder5\jdk1.3\jre\lib\rt.jar;C:\jbuilder5\jdk
1.3\jre\lib\sunrsasign.jar;C:\jbuilder5\jdk1.3\lib\dt.jar;C:\jbuilder5\jdk1.
3\lib\tools.jar" htmleditor.Editor
At position:64 we have the comment: saved from
url=(0038)http://www.mydigiguide.com/dgx/wbl.dll
At position: 134 We have start Tag: html
At position: 140 We have start Tag: head
At position: 146 We have start Tag: title
At position: 153 We have text: DigiGuide: The Best TV Guide - myDigiGuide
Online Listings
At position: 211 We have end Tag: title
At position: 294 We have error msg: req.att contentmeta?
At position: 221 We have end Tag: head
At position: 221 We have end Tag: html
At position: 221 We have start Tag: html
At position: 221 We have start Tag: head
At position: 221 We have end Tag: head
At position: 221 We have start Tag: body
At position: 294 We have error msg: ioexception???
At position: 221 We have end Tag: body
At position: 221 We have end Tag: html
IO Exception
javax.swing.text.ChangedCharSetException
at
javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.ja
va:172)
at javax.swing.text.html.parser.Parser.startTag(Parser.java:327)
at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1786)
at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1821)
at javax.swing.text.html.parser.Parser.parse(Parser.java:1980)
at
javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:109)
at
javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:74)
at htmleditor.Editor.main(Editor.java:32)
The full html file is long so I have attatched it here. But the error is
occuring on about the third line. Alos I am not sure what you mean by where
does the reader come from so I have posted the java files.
And position 294 is the 294th character in the HTML file it is the first
instance of the word META.
Thank you for your help it is much apreciated.
James Gralton <jim...@yahoo.com> wrote:
> At position: 134 We have start Tag: html
> At position: 140 We have start Tag: head
> At position: 146 We have start Tag: title
> At position: 153 We have text: DigiGuide: The Best TV Guide - myDigiGuide
> Online Listings
> At position: 211 We have end Tag: title
> At position: 294 We have error msg: req.att contentmeta?
> At position: 221 We have end Tag: head
> At position: 221 We have end Tag: html
> At position: 221 We have start Tag: html
There still must be something incorrect so that "html" is closed and
reopened again.
> IO Exception
> javax.swing.text.ChangedCharSetException
Pass "true" as third argument to parse().
> The full html file is long so I have attatched it here. But the error is
I don't find it ...?
> And position 294 is the 294th character in the HTML file it is the first
> instance of the word META.
<meta http-equiv="Content-Type" content="text/html; charset=XXX">
causes the parser to throw ChangedCharSetException.
Christian
//Get the kit
HTMLEditorKit kit = getKit();
// Create default doc
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
doc.putProperty( "IgnoreCharsetDirective", Boolean.TRUE );
Reader r = new InputStreamReader( new FileInputStream( .....
kit.read( r, doc, 0 );
The important line here is the IgnoreCharsetDirective property set.
hth,
Craig
"James Gralton" <jim...@yahoo.com> wrote in message
news:QwRt9.648$Ao.58838@newsfep2-gui...
Much appreciated
James
Christian Kaufhold <use...@chka.de> wrote in message
news:3t3db80eb...@simia.chka.de...
--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation