HTML to plain-text conversion behavior seems to differ from version 1.6.7.2 to 1.6.8 in some cases.

15 views
Skip to first unread message

Sri Vishnu Totakura

unread,
Jul 21, 2016, 2:04:45 PM7/21/16
to nokogiri-talk, michael...@experteer.com
Hi,

We have recently updated Nokogiri from 1.6.7.2 to 1.6.8 and noticed that few of our tests that test conversion of HTML from emails to Plain-text fail.

We looked into it and looks like Nokogiri isn't returning the same output for some HTML in both the versions.

For example: 

Nokogiri::HTML("<=\nspan>Close</span>").text

returns "Close" on version 1.6.7.2  but returns "<=\nspan>Close" on version 1.6.8. If we remove the special characters =\n in the string, it returns "Close" as expected. These special characters are from the email clients or parsing (I suppose). 

Anyway, we are curious whether this an intended change? Could not find anything in the CHANGELOG that has direct reference with regards to the change in this behavior (or I'm not very well aware of the terminology used).

Peace,
Vishnu

Mike Dalessio

unread,
Feb 10, 2017, 5:01:07 AM2/10/17
to nokogiri-talk, michael...@experteer.com
Hi,

Apologies for the late reply. In the CHANGELOG it's noted that 1.6.8 included an update of libxml2 from 2.9.2 to 2.9.4, which might explain some small behavioral differences that you're seeing, particularly around how invalid markup is corrected.

-m


--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-talk+unsubscribe@googlegroups.com.
To post to this group, send email to nokogi...@googlegroups.com.
Visit this group at https://groups.google.com/group/nokogiri-talk.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages