Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Removing HTML from text

0 views
Skip to first unread message

Bill H

unread,
May 7, 2008, 6:29:24 AM5/7/08
to
I am looking for a perl routine that will strip HTML from a text file
and allow me to setup exceptions. For example, remove all HTML except
<B> <I> <U> <P> and their close tags, and optionally (preferably)
clean up any <P> tags so that contain just <P ALIGN="LEFT"> (or right
or center).

Is there such a beast out there before I write my own code?

Bill H

Ben Bullock

unread,
May 7, 2008, 7:08:22 AM5/7/08
to

The place to look is http://search.cpan.org/.

Jürgen Exner

unread,
May 7, 2008, 8:18:35 AM5/7/08
to
Bill H <bi...@ts1000.us> wrote:
>I am looking for a perl routine that will strip HTML from a text file
>and allow me to setup exceptions. For example, remove all HTML except

Your Question is Asked Frequently: perldoc -q "remove HTML"

><B> <I> <U> <P> and their close tags, and optionally (preferably)
>clean up any <P> tags so that contain just <P ALIGN="LEFT"> (or right
>or center).

You can define any custom action (remove or retain or whatever you like)
for any HTML element.

jue

0 new messages