Esteban
unread,Oct 20, 2009, 12:08:16 AM10/20/09Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to MochiWeb
Hi there, new to the group, but I've been using mochiweb for a while.
I'm using mochiweb_html to parse some pages, and noticed that it has
some problems with some ugly html constructions.
Example:
>mochiweb_html:tokens("<a href=userdetails.php?id=1340>ElFantasma</a>").
[{start_tag,<<"a">>,
[{<<"href">>,<<"userdetails.php?id">>},{<<>>,<<"1340">>}],
false},
{data,<<"ElFantasma">>,false},
{end_tag,<<"a">>}]
Note the href attribute gets broken because of missing quotes:
107> mochiweb_html:tokens("<a href=\"userdetails.php?
id=1340\">ElFantasma</a>").
[{start_tag,<<"a">>,
[{<<"href">>,<<"userdetails.php?id=1340">>}],
false},
{data,<<"ElFantasma">>,false},
{end_tag,<<"a">>}]
I know the problem is in poorly coded html pages, but since I have no
control over the pages I'm trying to parse, I need to make some
modifications in mochiweb_html code.
Any chances getting this kind of problems solved?
(I think I can manage to hack the tokenize_attr_value function to do
not stop when finding the '=' char, but not sure if this will
introduce more bugs)
Thanks,
Esteban