Help with regex and <p> tag

18 views
Skip to first unread message

Guilherme

unread,
Feb 28, 2014, 8:31:30 AM2/28/14
to rubyonra...@googlegroups.com
Hi,

I need to create a regex to extract 4 paragraphs of a text:

<p>
<b>Topic </b>–
abc
</p>


<p>abcd</p>
<p>abcde</p>
<p>abcdef</p>
<p>abcdefg</p>


I need to extract 4 paragraphs (text inside <p></p> including some html code) of this text using a regex.

How can I solve this problem ? I've tried a lot but I cant do this.

Thanks in advance.

Frederick Cheung

unread,
Feb 28, 2014, 9:07:53 AM2/28/14
to rubyonra...@googlegroups.com
The standard advice would be to not use regular expressions - use an html parser like nokogiri.

Fred
 
Thanks in advance.

Guilherme

unread,
Feb 28, 2014, 1:27:29 PM2/28/14
to rubyonra...@googlegroups.com
Yes sir.
Thanks for the tip.
This code makes the magic:

      doc = Nokogiri::HTML("<p>test</p>")
      doc.search("p")
Reply all
Reply to author
Forward
0 new messages