http-client: how should one handle tidy parse errors?

37 views
Skip to first unread message

mpilman

unread,
Jan 7, 2010, 5:52:17 AM1/7/10
to EXPath
Hi all,

What shall we do, when tidy is unable to parse a file? The spec does
not tell us anything about that. I think there are three
possibilities:

1. Throw an error
2. Return a string instead of a tidied xml node
3. Return a string instead of a tidied xml node and add a node to the
http:response node, indicating, that the parsing of the response
failed.

Personally I would prefer to do 3 - but since there is nothing
specified and I don't want to change the response-node to make it not
conformant to the spec, I do 2.

Could this please be specified how it should be done? (The same
problem is also there with xml parsing errors).

Markus

Florent Georges

unread,
Jan 7, 2010, 11:37:05 PM1/7/10
to exp...@googlegroups.com
2010/1/7 mpilman wrote:

Hi Markus,

> What shall we do, when tidy is unable to parse a file? The
> spec does not tell us anything about that.

Right. In general, the spec does not tell much about errors
for now, this still has to be improved. Thanks for pointing this
one out, and please do not hesitate to report any other.

> I think there are three possibilities:

> 1. Throw an error

That's what I'd do.

> 2. Return a string instead of a tidied xml node
> 3. Return a string instead of a tidied xml node and add a node
> to the http:response node, indicating, that the parsing of
> the response failed.

I don't really like those last two possibilities. If you
access HTML from within XPath, that's presumably because you need
to process the HTML tree. If a tidy library fails to parse the
HTML, I guess there is nothing you can do to recover from the
error. And if you can handle the text representation, you can
probably use @override-content-type (which does not cope well
with multipart responses but, well, I am not sure multipart
responses are much frequent).

> Personally I would prefer to do 3 - but since there is nothing
> specified and I don't want to change the response-node to make
> it not conformant to the spec, I do 2.

I cannot imagine a use case where a user is accessing an HTML
resource as a document node and where he is ok if it is returned
as a string if it is not well-formed. Do you have any precise
usage in mind?

> Could this please be specified how it should be done? (The same
> problem is also there with xml parsing errors).

Yes. And I would also simply throw an error in that case.

Changes are accumulating for the HTTP Client, I should really
try to release a new version of the draft quite soon.

Regards,

--
Florent Georges
http://www.fgeorges.org/

Florent Georges

unread,
Jan 9, 2010, 12:20:22 AM1/9/10
to exp...@googlegroups.com
2010/1/7 mpilman wrote:

> What shall we do, when tidy is unable to parse a file? The spec
> does not tell us anything about that.

The new revision of the spec should clarify that point:
http://www.expath.org/modules/http-client.html. Could you confirm
this is ok for you?

Reply all
Reply to author
Forward
0 new messages