Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Any way to get current file position in xml.parsers.expat?

0 views
Skip to first unread message

C. Laurence Gonsalves

unread,
Jul 13, 2001, 3:18:36 AM7/13/01
to
I'm writing a little utility that uses xml.parsers.expat (this is in
Python 2.1) to parse XML. I want to be able to generate semantic error
messages, in addition to the syntacic errors that expat handles. For
example, if the value of an attribute is invalid for whatever reason,
I'd like to be able to say "Invalid attribute 'foo': line 86, column 5"
(or something along those lines)

Unfortunately, I can't find any way to get the current position in the
file when using ParseFile. I know that if expat throws an ExpatError,
that will contain the line and column number, but that doesn't help me
in this situation, as the file is 'well formed' XML as far as expat is
concerned.

So is there any way to ask the expat parser where it is in the file?
Even a byte offset from the beginning of the file would be better than
nothing.

--
C. Laurence Gonsalves "Any sufficiently advanced
clgo...@kami.com technology is indistinguishable
http://cryogen.com/clgonsal/ from magic." -- Arthur C. Clarke

Van Gale

unread,
Jul 13, 2001, 3:30:52 AM7/13/01
to

"C. Laurence Gonsalves" <clgo...@keeshah.penguinpowered.com> wrote in
message news:slrn9kt86i....@keeshah.penguinpowered.com...

> Unfortunately, I can't find any way to get the current position in the
> file when using ParseFile. I know that if expat throws an ExpatError,
> that will contain the line and column number, but that doesn't help me
> in this situation, as the file is 'well formed' XML as far as expat is
> concerned.
>
> So is there any way to ask the expat parser where it is in the file?
> Even a byte offset from the beginning of the file would be better than
> nothing.

I'm pretty sure ErrorColumnNumber, ErrorLineNumber, and ErrorByteIndex are
updated even when there is no error.

Van

C. Laurence Gonsalves

unread,
Jul 13, 2001, 12:22:51 PM7/13/01
to
On Fri, 13 Jul 2001 07:30:52 GMT, Van Gale <cgale1@_remove_home.com> wrote:
>
>I'm pretty sure ErrorColumnNumber, ErrorLineNumber, and ErrorByteIndex
>are updated even when there is no error.

You're right, they do seem to get updated continuously. I didn't even
try checking this because the documentation explicitly says these values
won't be valid unless the parser raises an xml.parsers.expat.ExpatError
exception.

ie, http://python.org/doc/current/lib/xmlparser-objects.html says:

The following attributes contain values relating to the most recent
error encountered by an xmlparser object, and will only have correct
values once a call to Parse() or ParseFile() has raised a
xml.parsers.expat.ExpatError exception.

ErrorByteIndex
Byte index at which an error occurred.

...

ErrorColumnNumber
Column number at which an error occurred.

ErrorLineNumber
Line number at which an error occurred.

So is this an error in the documentation, or is the fact that these
values are continuousy updated merely a "quirk" in the current behaviour
that I shouldn't rely on because it may change at some point in the
future?

I'd be much happier if the docs didn't say that I can't do what I want
to do...

Paul Prescod

unread,
Jul 13, 2001, 7:24:45 PM7/13/01
to C. Laurence Gonsalves, pytho...@python.org
The behaviour isn't likely to change. The SAX handler depends on it.
--
Take a recipe. Leave a recipe.
Python Cookbook! http://www.ActiveState.com/pythoncookbook

Alex Martelli

unread,
Jul 14, 2001, 4:35:39 AM7/14/01
to
"Paul Prescod" <pa...@ActiveState.com> wrote in message
news:mailman.995067084...@python.org...

> The behaviour isn't likely to change. The SAX handler depends on it.

Then it might perhaps be good to document and "nail it
down", so that other components may reliably depend
on it just as well...


Alex


0 new messages