Bug in ie.all_tag_attributes

6 views
Skip to first unread message

Christian Fraenkel<semiharge@gmail.com>

unread,
Dec 12, 2007, 6:45:45 AM12/12/07
to Watir General
I've been accessing a site that has inline div advertisment. whenever
that advertisment shows I get a rexml parsing error, and a error.xml
gets generated.
I do not have access to the original html that generates the error as
the apperance of this specific ad is quite sporadic and generated on
the fly by javascript; what I can provide is the error.xml tho.
I've taken a look at the source, and I see the following problem:
the tag that generates the error is the following:
<a title="Beim nächsten Start öffnen"
onclick="this.style.behavior='url(#default#homepage)';this.setHomePage(adUri);"
href="#" false;?="false;?" return="return" >
the offending attribute false;?="false;?" makes the rexml parser throw
the following exception:

C:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:90:in `parse':
#<REXML::ParseException: malformed XML: missing tag start
(REXML::ParseException)
Line:
Position:
Last 80 unconsumed characters:
<a title="Beim nächsten Start öffnen"
onclick="this.style.behavior='url(#defau>
C:/ruby/lib/ruby/1.8/rexml/parsers/baseparser.rb:346:in `pull'
C:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:21:in `parse'
C:/ruby/lib/ruby/1.8/rexml/document.rb:190:in `build'
C:/ruby/lib/ruby/1.8/rexml/document.rb:45:in `initialize'
C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:683:in `new'
C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:683:in
`create_rexml_document_object'
C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:663:in
`rexml_document_object'
C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:867:in
`elements_by_xpath'
C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:860:in
`element_by_xpath'
./pages.rb:75:in `getRessources'
controller.rb:19
...
malformed XML: missing tag start
Line:
Position:
Last 80 unconsumed characters:
<a title="Beim nächsten Start öffnen"
onclick="this.style.behavior='url(#defau
Line:
Position:
Last 80 unconsumed characters:
<a title="Beim nächsten Start öffnen"
onclick="this.style.behavior='url(#defau from C:/ruby/lib/ruby/1.8/
rexml/document.rb:190:in `build'
from C:/ruby/lib/ruby/1.8/rexml/document.rb:45:in `initialize'
from C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:683:in
`new'
from C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:683:in
`create_rexml_document_object'
from C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:663:in
`rexml_document_object'
from C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:867:in
`elements_by_xpath'
from C:/ruby/lib/ruby/gems/1.8/gems/watir-1.5.3/./watir/ie.rb:860:in
`element_by_xpath'
from ./pages.rb:75:in `getRessources'
from controller.rb:19

(part of) the error lies in the regular expression used to check the
attribute name, found in ie.rb at line 762 in all_tag_attributes:
if tokens[count] =~ /^(\w|_|:)(.*)$/
the used "(.*)" does not match the definition of "(NameChar)*" used in
the xml spec for attributes - .* does match the offending ;? in
false;? whereas (NameChar)* doesnt.
the solution to this is to restrict the regular expression used in
ie.rb more - I'll take a look at that later after dinner.

a related question:
how are bug reports handled here, should I have tried to post this
straight on the jiira tracker or .. .?

thanks in advance

Christian

Željko Filipin

unread,
Dec 12, 2007, 6:58:13 AM12/12/07
to watir-...@googlegroups.com
Hi Christian,

Comments are inline.


On Dec 12, 2007 12:45 PM, Christian Fraenkel<semi...@gmail.com> <semi...@gmail.com > wrote:
> C:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:90:in `parse':
> Last 80 unconsumed characters:
> <a title="Beim nächsten Start öffnen"

I had problems with parsing xml when there was non-English characters. For more information see
http://zeljkofilipin.com/2006/03/15/utf-8-and-ruby/


> how are bug reports handled here, should I have tried to post this
> straight on the jiira tracker or .. .?

Since this looks like REXML bug, there is no point in creating Watir ticket, right?

Željko
--
ZeljkoFilipin.com

Christian Fraenkel<semiharge@gmail.com>

unread,
Dec 12, 2007, 7:40:53 AM12/12/07
to Watir General


On Dec 12, 12:58 pm, "Željko Filipin" <zeljko.fili...@gmail.com>
wrote:
> Hi Christian,
>
> Comments are inline.
>
> On Dec 12, 2007 12:45 PM, Christian Fraenkel<semiha...@gmail.com> <
>
> semiha...@gmail.com> wrote:
> > C:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:90:in `parse':
> > Last 80 unconsumed characters:
> > <a title="Beim nächsten Start öffnen"
>
> I had problems with parsing xml when there was non-English characters. For
> more information seehttp://zeljkofilipin.com/2006/03/15/utf-8-and-ruby/

I've encountered the problem you specified yesterday and already found
the fix you posted for that issue. (read: my code currently contains
those lines and still throws the error)

> > how are bug reports handled here, should I have tried to post this
> > straight on the jiira tracker or .. .?
>
> Since this looks like REXML bug, there is no point in creating Watir ticket,
> right?

its not a rexml bug - rexml has every right to throw an error here as
watir is not feeding it correct xml.
watir is generating xml that contains ';' and '?' as part of an
attribute name in my case. This is -incorrect- xml and has to be fixed
on the watir side.
watir uses /^(\w|_|:)(.*)$/ to accept attribute names - this is too
"open"; it needs to be much more restrictive, perhaps in the lines of /
^([a-zA-Z_:])([\w.-_:]*)$/ .This is not perfect - it probably excludes
lots and lots of unicode characters as part of attribute names - but
it is correct insofar as it does not allow any forbidden characters
(and as such, rexml should never throw an error about that anymore)

Željko Filipin

unread,
Dec 12, 2007, 11:33:24 AM12/12/07
to watir-...@googlegroups.com
Looks to me that you have good evidence that this is a Watir bug. Nice way for it be fixed is to submit a Jira ticket. Better way is to fix it and submit a patch.

Željko

Christian Fraenkel<semiharge@gmail.com>

unread,
Dec 12, 2007, 11:58:25 AM12/12/07
to Watir General
Turns out my earlier post was not correct as my own regular expression
didnt fix my problem.
the correct regular expression to fix this bug is /^[\w:][\w0-9.\-:]*
$/

the whole block looks like this then
(watir/ie.rb:761)
# If attribute name is valid. Refer: http://www.w3.org/TR/REC-xml/#NT-Name
if tokens[count] =~ /^[\w:][\w0-9.\-:]*$/
tagLine += " #{tokens[count]}"
expectedEqualityOP = true
end

so... what has to be done to get this (or something similar) into the
official distribution to fix that bug ?

Christian Fraenkel<semiharge@gmail.com>

unread,
Dec 12, 2007, 12:00:00 PM12/12/07
to Watir General
thanks, will do that, strangely enough google groups didnt show your
reply before I posted :/

Željko Filipin

unread,
Dec 12, 2007, 12:02:19 PM12/12/07
to watir-...@googlegroups.com
On Dec 12, 2007 5:58 PM, Christian Fraenkel<semi...@gmail.com> <semi...@gmail.com> wrote:
> so... what has to be done to get this (or something similar) into the
> official distribution to fix that bug ?

Bret wrote blog post. Take a look:

http://www.io.com/~wazmo/blog/archives/2006_09.html

Željko

Christian Fraenkel<semiharge@gmail.com>

unread,
Dec 13, 2007, 7:28:33 AM12/13/07
to Watir General

Bret Pettichord

unread,
Dec 18, 2007, 11:42:14 PM12/18/07
to watir-...@googlegroups.com
Christian Fraenkel<semi...@gmail.com> wrote:
> posted on the jira: http://jira.openqa.org/browse/WTR-190
Thanks for your attention to this. I will look at your fix.

Bret

--
Bret Pettichord
Lead Developer, Watir, http://wtr.rubyforge.org
Blog, http://www.io.com/~wazmo/blog

Reply all
Reply to author
Forward
0 new messages