Differences between Nokogiri 1.4 and 1.5 on JRuby

61 views
Skip to first unread message

Raimonds Simanovskis

unread,
Jan 25, 2011, 3:08:17 PM1/25/11
to nokogi...@googlegroups.com
After testing one of my JRuby gems that uses Nokogiri I found the following differences between Nokogiri 1.4 and 1.5 on JRuby:

1) Double quote inside XML node attributes in 1.4 was quoted like " but in 1.5 is quoted like %22

2) In 1.4 XML node attributes where listed in the same sequence as keys in Ruby Hash (and in JRuby it seems that keys are enumerated always in the sequence of their creation) but in 1.5 attributes are always sorted alphabetically.


Semantically result is the same (actually don't know about %22 - I thought that by specification it should be ") but it makes testing a little bit harder when in tests I include expected generated XML output string. I would prefer that Nokogiri 1.5 #to_xml would produce the same result as Nokogiri 1.4 :)

Raimonds

Yoko Harada

unread,
Jan 25, 2011, 9:32:37 PM1/25/11
to nokogi...@googlegroups.com
Hello,
Thanks for testing pure Java Nokogiri.

On Tue, Jan 25, 2011 at 3:08 PM, Raimonds Simanovskis
<raimonds.s...@gmail.com> wrote:
> After testing one of my JRuby gems that uses Nokogiri I found the following
> differences between Nokogiri 1.4 and 1.5 on JRuby:
> 1) Double quote inside XML node attributes in 1.4 was quoted like &quot; but
> in 1.5 is quoted like %22

I fixed this bug. (rev. 558e7cd)

> 2) In 1.4 XML node attributes where listed in the same sequence as keys in
> Ruby Hash (and in JRuby it seems that keys are enumerated always in the
> sequence of their creation) but in 1.5 attributes are always sorted
> alphabetically.

I'm looking at this bug. Hopefully, it will be fix soon.

Thanks for giving reproducing code. It is helpful.

> Semantically result is the same (actually don't know about %22 - I thought
> that by specification it should be &quot;) but it makes testing a little bit
> harder when in tests I include expected generated XML output string. I would
> prefer that Nokogiri 1.5 #to_xml would produce the same result as Nokogiri
> 1.4 :)

Yes, that's the best.

-Yoko

> Raimonds
>
> --
> You received this message because you are subscribed to the Google Groups
> "nokogiri-talk" group.
> To post to this group, send email to nokogi...@googlegroups.com.
> To unsubscribe from this group, send email to
> nokogiri-tal...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/nokogiri-talk?hl=en.
>

Mike Dalessio

unread,
Jan 26, 2011, 7:57:15 AM1/26/11
to nokogi...@googlegroups.com

On Tue, Jan 25, 2011 at 3:08 PM, Raimonds Simanovskis <raimonds.s...@gmail.com> wrote:
>
> After testing one of my JRuby gems that uses Nokogiri I found the following differences between Nokogiri 1.4 and 1.5 on JRuby:
>
> 1) Double quote inside XML node attributes in 1.4 was quoted like &quot; but in 1.5 is quoted like %22
>

> 2) In 1.4 XML node attributes where listed in the same sequence as keys in Ruby Hash (and in JRuby it seems that keys are enumerated always in the sequence of their creation) but in 1.5 attributes are always sorted alphabetically.
>

> Semantically result is the same (actually don't know about %22 - I thought that by specification it should be &quot;) but it makes testing a little bit harder when in tests I include expected generated XML output string. I would prefer that Nokogiri 1.5 #to_xml would produce the same result as Nokogiri 1.4 :)

Thank you for taking the time to let us know what the transition experience is likely to look like for JRuby users.

Instead of spending time making the output *representation* identical, I'd rather invest the time in better XML testing tools (like lorax (https://github.com/flavorjones/lorax) or nokogiri-diff (https://github.com/postmodern/nokogiri-diff)), which will test for semantic equality of the document structure, and not byte equality of the document representation.
 
How hard would it be to convert your test to use lorax? What enhancements would you like to see in it (other than nice test helpers)?

Jonathan Rochkind

unread,
Jan 26, 2011, 8:57:09 AM1/26/11
to nokogi...@googlegroups.com
Awesome, thanks for the pointers! Tools for testing XML generation in automated tests are very welcome, I'll check those out.

Even without an upgrade, I've been having issues figuring out how to easily deal with testing XML generation with nokogiri -- even with everyone thinking they're using hte same (1.4) version, on a collaborative project I work on some difference in gems/libxml-version/something has led to different order of attributes in generated XML on different people's machines, in a way that both are perfectly valid identially-meaning XML, but meant our first naive unit tests weren't sufficient.


________________________________________
From: nokogi...@googlegroups.com [nokogi...@googlegroups.com] On Behalf Of Mike Dalessio [mike.d...@gmail.com]
Sent: Wednesday, January 26, 2011 7:57 AM
To: nokogi...@googlegroups.com
Subject: Re: [nokogiri-talk] Differences between Nokogiri 1.4 and 1.5 on JRuby

--

Raimonds Simanovskis

unread,
Jan 26, 2011, 9:00:09 AM1/26/11
to nokogi...@googlegroups.com
Currently I just ignore whitespace differences using custom RSpec matcher be_like, here is example of some simple test

It seems that I could just replace definition of this custom be_like matcher to use lorax or nokogiri-diff for comparison. But from the other side if I want to test the XML end result (which I pass to other Java library) then I feel safer to compare actual generated XML string. Otherwise I will be testing if I use Nokogiri to generate string and then from this string I generate back Nokogiri object do I get the same Nokogiri object back :)

Raimonds

Yoko Harada

unread,
Jan 26, 2011, 5:20:08 PM1/26/11
to nokogi...@googlegroups.com
Hello,

On Tue, Jan 25, 2011 at 9:32 PM, Yoko Harada <yok...@gmail.com> wrote:
> Hello,
> Thanks for testing pure Java Nokogiri.
>
> On Tue, Jan 25, 2011 at 3:08 PM, Raimonds Simanovskis
> <raimonds.s...@gmail.com> wrote:
>> After testing one of my JRuby gems that uses Nokogiri I found the following
>> differences between Nokogiri 1.4 and 1.5 on JRuby:
>> 1) Double quote inside XML node attributes in 1.4 was quoted like &quot; but
>> in 1.5 is quoted like %22
>
> I fixed this bug. (rev. 558e7cd)
>
>> 2) In 1.4 XML node attributes where listed in the same sequence as keys in
>> Ruby Hash (and in JRuby it seems that keys are enumerated always in the
>> sequence of their creation) but in 1.5 attributes are always sorted
>> alphabetically.
>
> I'm looking at this bug. Hopefully, it will be fix soon.

I'm sorry to say this realized unable to fix. Sorting happens when an
attribute is set to a node within Xerces. After googling, I learned
there is no option to avoid sorting. Xerces doesn't think much of the
literal order of a given document since logical document structure is
the same.

So, XML testing tool Mike suggested in this thread would be the choice.

-Yoko

Raimonds Simanovskis

unread,
Jan 26, 2011, 5:48:37 PM1/26/11
to nokogi...@googlegroups.com


On Thursday, January 27, 2011 12:20:08 AM UTC+2, yokolet wrote:

I'm sorry to say this realized unable to fix. Sorting happens when an
attribute is set to a node within Xerces. After googling, I learned
there is no option to avoid sorting. Xerces doesn't think much of the
literal order of a given document since logical document structure is
the same.

So, XML testing tool Mike suggested in this thread would be the choice.

Thanks for clarification.
If it always sorts alphabetically then it is also predictable and I can simply update my specs to expect that attributes always will be in alphabetical order.

Raimonds

Reply all
Reply to author
Forward
0 new messages