ParseError: Junk after Document message

1,699 views
Skip to first unread message

iaine...@googlemail.com

unread,
Mar 31, 2008, 9:30:13 AM3/31/08
to Genshi
I'm a newbie to both python and Genshi so I'm on a steep learning
curve.

I'm working on porting the openShakespeare code (Shakespeare package
on PyPI) over the works of Milton but have run into a Genshi error.
Whilst I understand the message itself, I'm not sure how whether there
is work around to discover what the junk is.

The error message received is from genshi\input.py:
raise ParseError(msg, self.filename, e.lineno, e.offset)
ParseError: junk after document element: line 2, column 0

In theory, the code is supposed to transform a text file and add in
line numbers to it (rather than just showing the plain view which is
fine) but it hasn't worked yet.

Is there a work around to discover the problem or a best practice way
of "sanitising" the file?

Thanks.

Matt Good

unread,
Mar 31, 2008, 11:41:01 PM3/31/08
to Genshi
On Mar 31, 6:30 am, iainems...@googlemail.com wrote:
> I'm a  newbie to both python and Genshi so I'm on a steep learning
> curve.
>
> I'm working on porting the openShakespeare code (Shakespeare package
> on PyPI) over the works of Milton but have run into a Genshi error.
> Whilst I understand the message itself, I'm not sure how whether there
> is  work around to discover what the junk is.
>
> The error message received is from genshi\input.py:
> raise ParseError(msg, self.filename, e.lineno, e.offset)
> ParseError: junk after document element: line 2, column 0

You can't have any content after closing the root element of your
document. For example:

>>> import xml.parsers.expat
>>> p = xml.parsers.expat.ParserCreate()
>>> p.Parse('<root></root>junk')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
xml.parsers.expat.ExpatError: junk after document element: line 1,
column 13

So, remove any text after the last element in your document (usually </
html>)

-- Matt

iaine...@googlemail.com

unread,
Apr 1, 2008, 11:03:06 AM4/1/08
to Genshi
Thanks for this. Looking closer it looks like the xi:includes is
including html coding from an included file as well so I get two sets
of HTML headers and </html>.

Not sure this should happen and I'm also trying to get the page to
pass W3C standards.

Is there a way of including a file which has mark up but without the
html tags as it is really a footer templatewhich is used across
several pages?

iaine...@googlemail.com

unread,
Apr 1, 2008, 11:14:08 AM4/1/08
to Genshi
All sorted out now. Ta for the pointers.

iaine...@googlemail.com

unread,
Apr 7, 2008, 8:41:11 AM4/7/08
to Genshi
No, mailed too early. The code I'm working with is:
class TextFormatterLineno(TextFormatter):
"""Format the text to have line numbers.
"""

def format(self, file):
self.file = file
result = ''
count = 0
for line in self.file.readlines():
tlineno = unicode(count).ljust(4) # assume line no < 10000
tline = unicode(line, 'utf-8').rstrip()
tline = self.escape_chars(tline)
result += u'<pre id="%s">%s %s</pre>\n' % (count, tlineno,
tline)
count += 1
return result
which passes the XHTML to the line:
thtml = genshi.XML(ttext)
which is where the Junk after Document element comes in.

Is there a Genshi method to either transform the HTML or a better line
that genshi.XML which would make the markup machine readably happy?

Iain

On Apr 1, 4:41 am, Matt Good <m...@matt-good.net> wrote:

Tim Hatch

unread,
Apr 7, 2008, 8:54:51 AM4/7/08
to gen...@googlegroups.com
On Apr 7, 2008, at 7:41 AM, iaine...@googlemail.com wrote:

> result += u'<pre id="%s">%s %s</pre>\n' % (count, tlineno,
> tline)
> count += 1
> return result
> which passes the XHTML to the line:
> thtml = genshi.XML(ttext)
> which is where the Junk after Document element comes in.

Yes, because you don't have a root element. If you intend that to
work, you'd need to have something like a <div> around the whole thing.

> Is there a Genshi method to either transform the HTML or a better line
> that genshi.XML which would make the markup machine readably happy?

See http://genshi.edgewall.org/wiki/ApiDocs/0.4.x/genshi.builder
which will also handling escaping <, >, & for you.

Tim

Reply all
Reply to author
Forward
0 new messages