Nokogiri's HTML whitespace handling

174 views
Skip to first unread message

Charlie Somerville

unread,
Nov 30, 2011, 11:10:40 PM11/30/11
to nokogiri-talk
Hi all,

I'm using Nokogiri in a HTML sanitizer. I've noticed some funny issues
with Nokogiri discarding significant whitespace in some cases and
adding significant whitespace with its 'to_html' indentation which
affects how the HTML renders.

I've solved the latter issue by writing my own html formatting method,
but I can't solve Nokogiri's whitespace clobbering issue.

I know there exists an attribute in XML that makes the parser preserve
all whitespace, but that attribute doesn't work when parsing as HTML.

I've opened a GitHub issue about this (https://github.com/tenderlove/
nokogiri/issues/575), but I read that this group is a better place to
get support.

Thanks!

Reply all
Reply to author
Forward
0 new messages