how to remove blank lines inside a document while using nokogiri?

685 views
Skip to first unread message

Arup Rakshit

unread,
Nov 7, 2013, 3:24:54 PM11/7/13
to nokogi...@googlegroups.com
Hi,

I wrote a code as below :

require 'nokogiri'

doc = Nokogiri.HTML <<-eot
<p>Here you can find
    <a href="ssNODELINK/SurvivalStatistics">Survival stats </a>
    <a href="ssNODELINK/SmokingStatistics">Smoking stats </a>
    <a href="ssNODELINK/RisksAndCauses"> and Risks </a>
    <a target="_blank" href="http://www.something.ac.uk/"> Something </a>
of recent research</p>
eot

nodesets = doc.css('p > a')
nodesets.each do |nd|
  nd.unlink if nd['href'].include? 'ssNODELINK'
end

puts doc
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><p>Here you can find
# >>     
# >>     
# >>     
# >>     <a target="_blank" href="http://www.something.ac.uk/"> Something </a>
# >> of recent research</p></body></html>

But my expected output is :

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><p>Here you can find    
# >>     <a target="_blank" href="http://www.something.ac.uk/"> Something </a>
# >> of recent research</p></body></html>

Any help on this regard ?

Walter Lee Davis

unread,
Nov 7, 2013, 3:30:43 PM11/7/13
to nokogi...@googlegroups.com
So your issue here is whitespace? I've always had to run my Nokogiri output through a regexp to fix that, never found any way to do it in Noko itself.

puts doc.to_html.gsub(/^\s*\n/, "")

Walter
> --
> You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-tal...@googlegroups.com.
> To post to this group, send email to nokogi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/nokogiri-talk.
> For more options, visit https://groups.google.com/groups/opt_out.

Wayne Brisette

unread,
Nov 7, 2013, 3:44:29 PM11/7/13
to nokogi...@googlegroups.com
Walter: I think what he is referring to is the extra returns he is getting in the output, not 'whitespace' per se.  Look at his example again with that in mind. His posted expected results simply don't have the additional line feeds/returns in them. I think that's what he is trying to no have in his output.

Wayne



From: Walter Lee Davis <wa...@wdstudio.com>
To: nokogi...@googlegroups.com
Sent: Thursday, November 7, 2013 2:30 PM
Subject: Re: [nokogiri-talk] how to remove blank lines inside a document while using nokogiri?
> To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-talk+unsub...@googlegroups.com.

> To post to this group, send email to nokogi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/nokogiri-talk.
> For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-talk+unsub...@googlegroups.com.

Arup Rakshit

unread,
Nov 7, 2013, 11:38:53 PM11/7/13
to nokogi...@googlegroups.com
Thanks!

But I want to modify the original doc also. Not the output only to format.

Walter Lee Davis

unread,
Nov 7, 2013, 11:55:09 PM11/7/13
to nokogi...@googlegroups.com

Arup Rakshit

unread,
Nov 21, 2013, 9:32:28 AM11/21/13
to nokogi...@googlegroups.com
I tried but not working for me. Those are not removed, which I wanted to be. :(
Reply all
Reply to author
Forward
0 new messages