convert html to plain text in ruby

185 views
Skip to first unread message

bluesc...@gmail.com

unread,
Oct 1, 2008, 6:31:40 PM10/1/08
to Ruby on Rails: Talk
Hi,

I'm looking for a way to convert html to plain text.
Now, I know about strip_tags, but - as the name says - that only
strips the tags.

What I need is to get stuff like &amp; and &lt; back to & and < too.
Any help?

Thanks,
Mathijs

Richard Luther

unread,
Oct 2, 2008, 2:26:38 PM10/2/08
to Ruby on Rails: Talk
You could use some regexp and the hash ERB::Util::HTML_ESCAPE to
return the unescaped versions of the characters.
- Richard

On Oct 1, 3:31 pm, "bluescreen...@gmail.com" <bluescreen...@gmail.com>
wrote:

Walter McGinnis

unread,
Oct 2, 2008, 9:56:50 PM10/2/08
to rubyonra...@googlegroups.com
You might be able to check out some example code in convert_attachment_to plugin:

http://github.com/kete/convert_attachment_to/tree/master

Depending on configuration, it will take an uploaded HTML file (or PDF, MS doc...) and convert it into a plain text attribute, etc.  Probably overkill for what what you are after, but might have something you can learn from.

Cheer,
Walter

Reply all
Reply to author
Forward
0 new messages