[jruby-user] Nokogiri 1.5.0 Released

608 views
Skip to first unread message

Yoko Harada

unread,
Jul 2, 2011, 10:25:27 PM7/2/11
to us...@jruby.codehaus.org
Hello JRubyists,

Nokogiri 1.5.0 ("Y U NO RELEASE?" Edition) is out!
http://groups.google.com/group/nokogiri-talk/browse_thread/thread/4bab7b9b72b40e5c

This is a pivotal version to JRuby users. No libxml/libxslt behind
Nokogiri. When you use Nokogiri on JRuby, you don't need any C
libraries. Instead, Xerces/NekoHtml and a couple more pure Java APIs
are used. Installing nokogiri gem is all. Java APIs are included in
the gem package. Right after the gem installation, nokogiri will work
on various platforms.

Just for JRuby users' methods are added:
- You can wrap org.w3c.dom.Document to build Nokogiri::XML::Document
- You can get org.w3c.dom.Document from Nokogiri::XML::Document

See "Java Integration" section of
https://github.com/tenderlove/nokogiri/wiki/Pure-Java-Nokogiri-for-JRuby
for details.

Here's a link for those who are worrying about performance,
http://blog.flavorjon.es/2011/05/fairy-wing-wrapup-nokogiri-performance.html

Although pure Java Nokogiri is slow to parse a big XML document, it is
4 times faster than FFI version.

Give it a try. Give us your feedback.
-Yoko

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


consiliens

unread,
Jul 3, 2011, 12:40:09 AM7/3/11
to us...@jruby.codehaus.org
On 07/02/2011 08:25 PM, Yoko Harada wrote:
> Give it a try. Give us your feedback.

I tried Java Nokogiri recently and premailer didn't work. Are there
limitations on what Java Nokogiri can support? Or will it be able to do
anything regular Nokogiri can do?

https://github.com/alexdunae/premailer/blob/master/lib/premailer/adapter/nokogiri.rb

Yoko Harada

unread,
Jul 3, 2011, 1:25:59 AM7/3/11
to us...@jruby.codehaus.org
Hi,

On Sun, Jul 3, 2011 at 12:40 AM, consiliens <consi...@gmail.com> wrote:
> On 07/02/2011 08:25 PM, Yoko Harada wrote:
>>
>> Give it a try. Give us your feedback.
>
> I tried Java Nokogiri recently and premailer didn't work.  Are there
> limitations on what Java Nokogiri can support?  Or will it be able to do
> anything regular Nokogiri can do?
>
> https://github.com/alexdunae/premailer/blob/master/lib/premailer/adapter/nokogiri.rb

Did you use pure Java Nokogiri under the environment that a custom
class loader was involved? If so, you need to move Java archives to
the directory that the custom class loader sees. For example,
WEB-INF/lib.
If you have an error message, that will help to figure out.

-Yoko

kristian

unread,
Jul 3, 2011, 2:00:21 AM7/3/11
to us...@jruby.codehaus.org
which versions of the library are these ? which versions of xerces is
nokogiri compatible with ? servlets containers usually comes with some
xml libraries and xercesImpl is quite common ! is it possible not to
require those jar if xerces is already in the parent classloader ?

- Kristian

On Sun, Jul 3, 2011 at 10:55 AM, Yoko Harada <yok...@gmail.com> wrote:
> Did you use pure Java Nokogiri under the environment that a custom
> class loader was involved? If so, you need to move Java archives to
> the directory that the custom class loader sees. For example,
> WEB-INF/lib.

---------------------------------------------------------------------

consiliens

unread,
Jul 3, 2011, 2:50:37 AM7/3/11
to us...@jruby.codehaus.org
On 07/02/2011 11:25 PM, Yoko Harada wrote:
> Did you use pure Java Nokogiri under the environment that a custom
> class loader was involved? If so, you need to move Java archives to
> the directory that the custom class loader sees. For example,
> WEB-INF/lib.
> If you have an error message, that will help to figure out.
>
> -Yoko

There's no error message and no custom class loader. I don't have
hpricot installed so premailer is forced to use nokogiri.

nokogiri (1.5.0 java)
jruby 1.6.2
Java 1.6.0_26

git clone https://github.com/alexdunae/premailer.git

premailer/test/files$ jruby -S premailer base.html

Nothing is displayed using Java nokogiri.

Running the same command with MRI and regular nokogiri prints the
expected text to standard output.

Reto Schüttel

unread,
Jul 4, 2011, 9:06:22 AM7/4/11
to us...@jruby.codehaus.org
Hi Yoko

Thanks for the update!

> Give it a try. Give us your feedback.

I've ported one of our bigger nokogiri scripts to jruby/nokogiri 1.5.0 and found the following bug:

> require 'rubygems'
> gem 'nokogiri', '1.5.0'
> require 'nokogiri'
>
> input = '<root><p></p></root>'
>
> doc = Nokogiri::XML(input, nil, 'UTF-8')
> doc.css("p").first.replace("a b:c")
> out = doc.children.first.to_s
>
> raise out.inspect unless out == "<root>a b:c</root>"

This should return <root>a b:c</root>, but it actually returns: <root>a c</root>. It works fine with JRuby w/ Nokogiri 1.4.7 and with Ruby w/ Nokogiri 1.5.0

I'm jusing jruby 1.61. here!


Cheers!
Reto Schüttel

Yoko Harada

unread,
Jul 4, 2011, 1:27:01 PM7/4/11
to us...@jruby.codehaus.org
Hi,

On Sun, Jul 3, 2011 at 2:50 AM, consiliens <consi...@gmail.com> wrote:
> On 07/02/2011 11:25 PM, Yoko Harada wrote:
>>
>> Did you use pure Java Nokogiri under the environment that a custom
>> class loader was involved? If so, you need to move Java archives to
>> the directory that the custom class loader sees. For example,
>> WEB-INF/lib.
>> If you have an error message, that will help to figure out.
>>
>> -Yoko
>
> There's no error message and no custom class loader.  I don't have hpricot
> installed so premailer is forced to use nokogiri.
>
> nokogiri (1.5.0 java)
> jruby 1.6.2
> Java 1.6.0_26
>
> git clone https://github.com/alexdunae/premailer.git
>
> premailer/test/files$ jruby -S premailer base.html
>
> Nothing is displayed using Java nokogiri.
>
> Running the same command with MRI and regular nokogiri prints the expected
> text to standard output.

OK. This reproduced. Please file the bug at
https://github.com/tenderlove/nokogiri/issues?state=open
If possible, would you add a simple reproduce-able code? That will help.

Thanks,
-Yoko

Yoko Harada

unread,
Jul 4, 2011, 1:24:12 PM7/4/11
to us...@jruby.codehaus.org
Hi Kristian,

On Sun, Jul 3, 2011 at 2:00 AM, kristian <m.kri...@web.de> wrote:
> which versions of the library are these ? which versions of xerces is
> nokogiri compatible with ? servlets containers usually comes with some
> xml libraries and xercesImpl is quite common ! is it possible not to
> require those jar if xerces is already in the parent classloader ?

I added Java APIs' info in
https://github.com/tenderlove/nokogiri/wiki/Pure-Java-Nokogiri-for-JRuby
. Thanks for asking this.

It is possible not to require xercesImpl.jar. Please see "Google App
Engine" section of the wiki above. This is a hack for an old version
of Nokogiri.

However, using xerces of parent classloader is not a good idea. Since
a single web application, for example a single war, should be
portable, everything to run the app should be in the war including
xercesImpl.jar. You'd better allow custom classloader to load
everything for the web app.

There's one more reason. Pure Java Nokogiri uses NekoHTML to parse
html files. Prior to Xerces, NekoHTML must be loaded to make it work.
In light of this, you should not to use the parent classloader.

-Yoko

Yoko Harada

unread,
Jul 4, 2011, 2:16:37 PM7/4/11
to us...@jruby.codehaus.org
Hi Reto

2011/7/4 Reto Schüttel <re...@schuettel.ch>:


> Hi Yoko
>
> Thanks for the update!
>
>> Give it a try. Give us your feedback.
>
> I've ported one of our bigger nokogiri scripts to jruby/nokogiri 1.5.0 and found the following bug:
>
>> require 'rubygems'
>> gem 'nokogiri', '1.5.0'
>> require 'nokogiri'
>>
>> input = '<root><p></p></root>'
>>
>> doc = Nokogiri::XML(input, nil, 'UTF-8')
>> doc.css("p").first.replace("a b:c")
>> out = doc.children.first.to_s
>>
>> raise out.inspect unless out == "<root>a b:c</root>"
>
> This should return <root>a b:c</root>, but it actually returns: <root>a c</root>. It works fine with JRuby w/ Nokogiri 1.4.7 and with Ruby w/ Nokogiri 1.5.0

This caused by a wrong fragment processing. I fixed the bug in rev. 798d047.

Thanks for using pure Java Nokogiri.
-Yoko

consiliens

unread,
Jul 4, 2011, 3:01:12 PM7/4/11
to us...@jruby.codehaus.org, Yoko Harada
On 07/04/2011 11:27 AM, Yoko Harada wrote:
> OK. This reproduced. Please file the bug at
> https://github.com/tenderlove/nokogiri/issues?state=open
> If possible, would you add a simple reproduce-able code? That will help.
>
> Thanks,
> -Yoko

I filed #485
https://github.com/tenderlove/nokogiri/issues/485

To make it easy to reproduce I've used base.html which is included as a
sample file in the premailer repository. I couldn't get any files to
work properly with premailer using Java Nokogiri.

Reto Schüttel

unread,
Jul 5, 2011, 9:07:57 AM7/5/11
to us...@jruby.codehaus.org
Hi Yoko

Am 04.07.2011 um 20:16 schrieb Yoko Harada:
>> This should return <root>a b:c</root>, but it actually returns: <root>a c</root>. It works fine with JRuby w/ Nokogiri 1.4.7 and with Ruby w/ Nokogiri 1.5.0
>
> This caused by a wrong fragment processing. I fixed the bug in rev. 798d047.


Thanks for the quick update, it is better now, but our script still struggles over the same bug, but in a different variation:

input = "<root><p>xyz</p></root>"

doc = Nokogiri::XML(input, nil, 'UTF-8')

p = doc.css("p").first
p.replace("<s/>A:B")

puts doc.to_s

This should print out
<root><s/>A:B</root>

But it produces:
<root><s>A:B</s></root>

Thank you!

Cheers,
Reto

Yoko Harada

unread,
Jul 8, 2011, 12:16:34 PM7/8/11
to us...@jruby.codehaus.org
Hello,

I opened the ticket, https://github.com/tenderlove/nokogiri/issues/490

This comes from a difference between libxml and Xerces parers. Xerces
won't parse not well-formed string but libxml does.

Please follow the issue.
-Yoko

2011/7/5 Reto Schüttel <re...@schuettel.ch>:

Erik Bågfors

unread,
Jul 12, 2011, 10:20:48 AM7/12/11
to us...@jruby.codehaus.org
Thanks for doing this.

I opened two bugs in github that are showstoppers for me. 493 and
492. They are easy to reproduce. For some reason I am not able to add
the label pure-java to them as requested.
Regards,
Erik

On Sun, Jul 3, 2011 at 4:25 AM, Yoko Harada <yok...@gmail.com> wrote:
> Give it a try. Give us your feedback.

---------------------------------------------------------------------

Yoko Harada

unread,
Jul 12, 2011, 4:04:24 PM7/12/11
to us...@jruby.codehaus.org
Erik,

On Tue, Jul 12, 2011 at 10:20 AM, Erik Bågfors <zin...@gmail.com> wrote:
> Thanks for doing this.
>
> I opened two bugs in github that are showstoppers for me.  493 and
> 492. They are easy to reproduce.  For some reason I am not able to add
> the label pure-java to them as requested.
> Regards,
> Erik

Thank you! I'll have a look.
-Yoko

Reply all
Reply to author
Forward
0 new messages