Bug in ie.all_tag_attributes

์กฐํšŒ์ˆ˜ 6ํšŒ
์ฝ์ง€ ์•Š์€ ์ฒซ ๋ฉ”์‹œ์ง€๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

Christian Fraenkel<semiharge@gmail.com>

์ฝ์ง€ ์•Š์Œ,
2007. 12. 12. ์˜ค์ „ 6:45:4507. 12. 12.
๋ฐ›๋Š”์‚ฌ๋žŒ Watir General
I've been accessing a site that has inline div advertisment. whenever
that advertisment shows I get a rexml parsing error, and a error.xml
gets generated.
I do not have access to the original html that generates the error as
the apperance of this specific ad is quite sporadic and generated on
the fly by javascript; what I can provide is the error.xml tho.
I've taken a look at the source, and I see the following problem:
the tag that generates the error is the following:
<a title="Beim nรคchsten Start รถffnen"
onclick="this.style.behavior='url(#default#homepage)';this.setHomePage(adUri);"
href="#" false;?="false;?" return="return" >
the offending attribute false;?="false;?" makes the rexml parser throw
the following exception:

C:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:90:in `parse':
#<REXML::ParseException: malformed XML: missing tag start
(REXML::ParseException)
Line:
Position:
Last 80 unconsumed characters:
<a title="Beim nรƒยคchsten Start รƒยถffnen"
onclick="this.style.behavior='url(#defau>
C:/ruby/lib/ruby/1.8/rexml/parsers/baseparser.rb:346:in `pull'
C:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:21:in `parse'
C:/ruby/lib/ruby/1.8/rexml/document.rb:190:in `build'
C:/ruby/lib/ruby/1.8/rexml/document.rb:45:in `initialize'
C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:683:in `new'
C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:683:in
`create_rexml_document_object'
C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:663:in
`rexml_document_object'
C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:867:in
`elements_by_xpath'
C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:860:in
`element_by_xpath'
./pages.rb:75:in `getRessources'
controller.rb:19
...
malformed XML: missing tag start
Line:
Position:
Last 80 unconsumed characters:
<a title="Beim nรƒยคchsten Start รƒยถffnen"
onclick="this.style.behavior='url(#defau
Line:
Position:
Last 80 unconsumed characters:
<a title="Beim nรƒยคchsten Start รƒยถffnen"
onclick="this.style.behavior='url(#defau from C:/ruby/lib/ruby/1.8/
rexml/document.rb:190:in `build'
from C:/ruby/lib/ruby/1.8/rexml/document.rb:45:in `initialize'
from C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:683:in
`new'
from C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:683:in
`create_rexml_document_object'
from C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:663:in
`rexml_document_object'
from C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:867:in
`elements_by_xpath'
from C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:860:in
`element_by_xpath'
from ./pages.rb:75:in `getRessources'
from controller.rb:19

(part of) the error lies in the regular expression used to check the
attribute name, found in ie.rb at line 762 in all_tag_attributes:
if tokens[count] =~ /^(\w|_|:)(.*)$/
the used "(.*)" does not match the definition of "(NameChar)*" used in
the xml spec for attributes - .* does match the offending ;? in
false;? whereas (NameChar)* doesnt.
the solution to this is to restrict the regular expression used in
ie.rb more - I'll take a look at that later after dinner.

a related question:
how are bug reports handled here, should I have tried to post this
straight on the jiira tracker or .. .?

thanks in advance

Christian

ลฝeljko Filipin

์ฝ์ง€ ์•Š์Œ,
2007. 12. 12. ์˜ค์ „ 6:58:1307. 12. 12.
๋ฐ›๋Š”์‚ฌ๋žŒ watir-...@googlegroups.com
Hi Christian,

Comments are inline.


On Dec 12, 2007 12:45 PM, Christian Fraenkel<semi...@gmail.com> <semi...@gmail.com > wrote:
> C:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:90:in `parse':
> Last 80 unconsumed characters:
> <a title="Beim nรƒยคchsten Start รƒยถffnen"

I had problems with parsing xml when there was non-English characters. For more information see
http://zeljkofilipin.com/2006/03/15/utf-8-and-ruby/


> how are bug reports handled here, should I have tried to post this
> straight on the jiira tracker or .. .?

Since this looks like REXML bug, there is no point in creating Watir ticket, right?

ลฝeljko
--
ZeljkoFilipin.com

Christian Fraenkel<semiharge@gmail.com>

์ฝ์ง€ ์•Š์Œ,
2007. 12. 12. ์˜ค์ „ 7:40:5307. 12. 12.
๋ฐ›๋Š”์‚ฌ๋žŒ Watir General


On Dec 12, 12:58 pm, "ลฝeljko Filipin" <zeljko.fili...@gmail.com>
wrote:
> Hi Christian,
>
> Comments are inline.
>
> On Dec 12, 2007 12:45 PM, Christian Fraenkel<semiha...@gmail.com> <
>
> semiha...@gmail.com> wrote:
> > C:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:90:in `parse':
> > Last 80 unconsumed characters:
> > <a title="Beim nรƒยคchsten Start รƒยถffnen"
>
> I had problems with parsing xml when there was non-English characters. For
> more information seehttp://zeljkofilipin.com/2006/03/15/utf-8-and-ruby/

I've encountered the problem you specified yesterday and already found
the fix you posted for that issue. (read: my code currently contains
those lines and still throws the error)

> > how are bug reports handled here, should I have tried to post this
> > straight on the jiira tracker or .. .?
>
> Since this looks like REXML bug, there is no point in creating Watir ticket,
> right?

its not a rexml bug - rexml has every right to throw an error here as
watir is not feeding it correct xml.
watir is generating xml that contains ';' and '?' as part of an
attribute name in my case. This is -incorrect- xml and has to be fixed
on the watir side.
watir uses /^(\w|_|:)(.*)$/ to accept attribute names - this is too
"open"; it needs to be much more restrictive, perhaps in the lines of /
^([a-zA-Z_:])([\w.-_:]*)$/ .This is not perfect - it probably excludes
lots and lots of unicode characters as part of attribute names - but
it is correct insofar as it does not allow any forbidden characters
(and as such, rexml should never throw an error about that anymore)

ลฝeljko Filipin

์ฝ์ง€ ์•Š์Œ,
2007. 12. 12. ์˜ค์ „ 11:33:2407. 12. 12.
๋ฐ›๋Š”์‚ฌ๋žŒ watir-...@googlegroups.com
Looks to me that you have good evidence that this is a Watir bug. Nice way for it be fixed is to submit a Jira ticket. Better way is to fix it and submit a patch.

ลฝeljko

Christian Fraenkel<semiharge@gmail.com>

์ฝ์ง€ ์•Š์Œ,
2007. 12. 12. ์˜ค์ „ 11:58:2507. 12. 12.
๋ฐ›๋Š”์‚ฌ๋žŒ Watir General
Turns out my earlier post was not correct as my own regular expression
didnt fix my problem.
the correct regular expression to fix this bug is /^[\w:][\w0-9.\-:]*
$/

the whole block looks like this then
(watir/ie.rb:761)
# If attribute name is valid. Refer: http://www.w3.org/TR/REC-xml/#NT-Name
if tokens[count] =~ /^[\w:][\w0-9.\-:]*$/
tagLine += " #{tokens[count]}"
expectedEqualityOP = true
end

so... what has to be done to get this (or something similar) into the
official distribution to fix that bug ?

Christian Fraenkel<semiharge@gmail.com>

์ฝ์ง€ ์•Š์Œ,
2007. 12. 12. ์˜คํ›„ 12:00:0007. 12. 12.
๋ฐ›๋Š”์‚ฌ๋žŒ Watir General
thanks, will do that, strangely enough google groups didnt show your
reply before I posted :/

ลฝeljko Filipin

์ฝ์ง€ ์•Š์Œ,
2007. 12. 12. ์˜คํ›„ 12:02:1907. 12. 12.
๋ฐ›๋Š”์‚ฌ๋žŒ watir-...@googlegroups.com
On Dec 12, 2007 5:58 PM, Christian Fraenkel<semi...@gmail.com> <semi...@gmail.com> wrote:
> so... what has to be done to get this (or something similar) into the
> official distribution to fix that bug ?

Bret wrote blog post. Take a look:

http://www.io.com/~wazmo/blog/archives/2006_09.html

ลฝeljko

Christian Fraenkel<semiharge@gmail.com>

์ฝ์ง€ ์•Š์Œ,
2007. 12. 13. ์˜ค์ „ 7:28:3307. 12. 13.
๋ฐ›๋Š”์‚ฌ๋žŒ Watir General

Bret Pettichord

์ฝ์ง€ ์•Š์Œ,
2007. 12. 18. ์˜คํ›„ 11:42:1407. 12. 18.
๋ฐ›๋Š”์‚ฌ๋žŒ watir-...@googlegroups.com
Christian Fraenkel<semi...@gmail.com> wrote:
> posted on the jira: http://jira.openqa.org/browse/WTR-190
Thanks for your attention to this. I will look at your fix.

Bret

--
Bret Pettichord
Lead Developer, Watir, http://wtr.rubyforge.org
Blog, http://www.io.com/~wazmo/blog

์ „์ฒด๋‹ต์žฅ
์ž‘์„ฑ์ž์—๊ฒŒ ๋‹ต๊ธ€
์ „๋‹ฌ
์ƒˆ ๋ฉ”์‹œ์ง€ 0๊ฐœ