The code below exits with exception on line "/usr/lib/ruby/1.8/tidy/tidybuf.rb:39". The html file to tidy is 700Kb in size.
cadena_html = Net::HTTP.get(URI.parse("http://www.santanderga.es/intega/IntegaFrontServlet?controller=IntegaHttpHandler&services=ConsFondosSrv&view=/productos/buscadorfondos/mostrar_fondos_inicio.jsp&categoria=&familia=&valoracion=22-01-2008&nombre=&controlvaloracion=no&informe=N&entorno=INTER"))
Tidy.path = '/usr/lib/libtidy.so'
xml = Tidy.open(:show_warnings=>true) { |tidy|
tidy.options.output_xml = true
tidy.options.indent = true
tidy.options.tidy_mark = false
tidy.options.numeric_entities = true
puts tidy.options.show_warnings
xml = tidy.clean(cadena_html)
}
-- System Information:
Debian Release: lenny/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'stable')
Architecture: i386 (i686)
Kernel: Linux 2.6.23 (PREEMPT)
Locale: LANG=C, LC_CTYPE=C (charmap=ISO-8859-1) (ignored: LC_ALL set to es_ES)
Shell: /bin/sh linked to /bin/bash
Versions of packages libtidy-ruby1.8 depends on:
ii libruby1.8 1.8.6.111-3 Libraries necessary to run Ruby 1.
ii libtidy-0.99-0 20080116cvs-2 HTML syntax checker and reformatte
libtidy-ruby1.8 recommends no packages.
-- no debconf information
--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
The problem is not limited to large HTML, it fails in the same manner
on any piece of HTML with any options:
angdraug@x41:~$ irb -r tidy
irb(main):001:0> html = '<p>line1</p><p>line2</p>'
=> <p>line1</p><p>line2</p>
irb(main):002:0> xml = Tidy.open() {|tidy| tidy.clean(html.to_s.untaint) }
/usr/lib/ruby/1.8/tidy/tidybuf.rb:39: [BUG] Segmentation fault
This is caused by new libtidy-0.99.0 (>= 20080116cvs) in unstable,
same code works fine with libtidy-0.99.0/20051018-1 in testing and
stable.
I will investigate whether it is a bug in libtidy-ruby or in tidy
itself, in the meanwhile temporary solution is to downgrade your
libtidy-0.99.0 package:
apt-get install libtidy-0.99.0/20051018-1
--
Dmitry Borodaenko