http://rubyforge.org/cgi-bin/viewcvs.cgi/builder/lib/builder/xmlmarkup.r
b.diff?r1=1.3&r2=1.4&cvsroot=builder
It seems that a new option has already been added to Builder::XmlMarkup
to escape XML attributes, but that it defaults to false. Is there a
reason for this? I was sure that <element attr="value&value"> was not
well-formed XML, and should read <element attr="value&value"> - why
is that not the default behaviour?
So, which library needs to be fixed? Builder or FeedTools?
(BTW - this is affecting the xml reprocessing in the CVS version of
MouseHole - so _why you might want to keep an eye on this).
#####################################################################################
This email has been scanned by MailMarshal, an email content filter.
#####################################################################################
def _escape(text)
text.
gsub(%r{&}, '&').
gsub(%r{<}, '<').
gsub(%r{>}, '>')
end
should be replaced with the more paranoid:
def _escape(text)
text.gsub(/[^-\w\d\/\n\r _:;+=.\@*,()#]/) do |x|
case x
when '"' : '"'
when '\'' : '''
when '<' : '<'
when '>' : '>'
when '&' : '&'
else
"&##{x[0]};"
end
end
end
This should escape everything outside of the ascii range to ensure the
data isn't corrupted by invalid characters. I might have missed a
character or 10, but escaping everything is safer than not escaping.
(though it should probably be split out to only escape quotes within
attribute strings, otherwise the output can look a little messy).
The hoodwink.d onslaught rss is currently not wellformed (there's a \210
character in a couple of entries), and I was going to pass it through
mousehole to fix it, but there were further issues there. If hoodwink.d
is using Builder::XmlMarkup (or FeedTools) to generate the onslaught
rss, it probably needs this change to stay valid.