Handling ' in mochiweb_charref

18 views
Skip to first unread message

tim

unread,
Oct 13, 2011, 8:37:32 PM10/13/11
to MochiWeb
Hey,
I'm using mochiweb_html to parse web page, and it's using
mochiweb_charref to decode HTML entities such as & but it doesn't
handle ' Is there a reason not to handle apostrophe? Does the
list of entities get updated when there is a new entities to support?
Can I file a ticket to add in support for apostrophe?

Thanks!
-Tim

Bob Ippolito

unread,
Oct 13, 2011, 8:49:36 PM10/13/11
to moch...@googlegroups.com
Go ahead and file an issue about it. I think I generated the list by
hand (in a text editor with regexes) from
http://www.w3.org/TR/html4/sgml/entities.html - if there's a better
list you should specify it in the issue and I'll see what I can do
about it.

> --
> You received this message because you are subscribed to the Google Groups "MochiWeb" group.
> To post to this group, send email to moch...@googlegroups.com.
> To unsubscribe from this group, send email to mochiweb+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mochiweb?hl=en.
>
>

Tim Kuo

unread,
Oct 13, 2011, 10:41:41 PM10/13/11
to moch...@googlegroups.com
Thanks for the quick response. I also realize there are some websites that have double encoded the entities (http://www.theonion.com/articles/last-american-who-knew-what-the-fuck-he-was-doing,26268/ and http://www.flickr.com/photos/rouvelee/3097230013/in/photostream/ for instances "), does it make sense to have mochiweb to double decode the content? or is it possible to expose the decoding method for others to use?

Thanks,
-Tim

Bob Ippolito

unread,
Oct 13, 2011, 10:48:51 PM10/13/11
to moch...@googlegroups.com
Double decoding is a bad idea. It's a mystery to me why they have
double-encoded their meta content but I'd guess it's just bad code on
their part. Either way, you can easily handle those meta tags
separately, I can't imagine a scenario where you would actually want
to do this on the whole document.
Reply all
Reply to author
Forward
0 new messages