Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Escaping Attributes with Builder::XmlMarkup

0 views
Skip to first unread message

Daniel Sheppard

unread,
Oct 27, 2005, 9:09:45 PM10/27/05
to
Just hit a problem using FeedTools where links with ampersands in them
were being left unescaped in the output. I realised this was a
Builder::XmlMarkup thing, and patched it there, but when I went to the
Builder::XmlMarkup CVS to see what was going on there, I found this:

http://rubyforge.org/cgi-bin/viewcvs.cgi/builder/lib/builder/xmlmarkup.r
b.diff?r1=1.3&r2=1.4&cvsroot=builder

It seems that a new option has already been added to Builder::XmlMarkup
to escape XML attributes, but that it defaults to false. Is there a
reason for this? I was sure that <element attr="value&value"> was not
well-formed XML, and should read <element attr="value&amp;value"> - why
is that not the default behaviour?

So, which library needs to be fixed? Builder or FeedTools?

(BTW - this is affecting the xml reprocessing in the CVS version of
MouseHole - so _why you might want to keep an eye on this).

#####################################################################################
This email has been scanned by MailMarshal, an email content filter.
#####################################################################################

Daniel Sheppard

unread,
Oct 28, 2005, 3:54:17 AM10/28/05
to
Further on this, the escaping performed by builder is a bit slack. I
believe that this:

def _escape(text)
text.
gsub(%r{&}, '&amp;').
gsub(%r{<}, '&lt;').
gsub(%r{>}, '&gt;')
end

should be replaced with the more paranoid:

def _escape(text)
text.gsub(/[^-\w\d\/\n\r _:;+=.\@*,()#]/) do |x|
case x
when '"' : '&quot;'
when '\'' : '&apos;'
when '<' : '&lt;'
when '>' : '&gt;'
when '&' : '&amp;'
else
"&##{x[0]};"
end
end
end

This should escape everything outside of the ascii range to ensure the
data isn't corrupted by invalid characters. I might have missed a
character or 10, but escaping everything is safer than not escaping.

(though it should probably be split out to only escape quotes within
attribute strings, otherwise the output can look a little messy).

The hoodwink.d onslaught rss is currently not wellformed (there's a \210
character in a couple of entries), and I was going to pass it through
mousehole to fix it, but there were further issues there. If hoodwink.d
is using Builder::XmlMarkup (or FeedTools) to generate the onslaught
rss, it probably needs this change to stay valid.

0 new messages