Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Strip HTML from msg body

169 views
Skip to first unread message

David Chapman

unread,
Mar 22, 2002, 1:30:10 AM3/22/02
to
I thought I had seen this here before, but I can't find it now.

I'm looking for a way of stripping HTML from a message body and displaying
just the message text in the mutt message body window.

Not interested in spawning a browser to read the message.

Any ideas?

TIA

--
Dave Chapman | "tar is not a plaything"
dcha...@canwest.victoria.bc.ca | bsim...@dt.springfield.edu

Loïc Minier

unread,
Mar 22, 2002, 1:39:56 AM3/22/02
to
On Fri, 22 Mar 2002 06:30:10 GMT,
David Chapman <dcha...@canwest.victoria.bc.ca> wrote:
> I'm looking for a way of stripping HTML from a message body and displaying
> just the message text in the mutt message body window.
> Not interested in spawning a browser to read the message.

Use w3m, lynx, or links to output your html in plain text :
in your .mailcap :

text/html; w3m -dump %s; nametemplate=%s.html; \
copiousoutput

In your muttrc :

auto_view text/html

I have following in my system-wide mime.types, but it should be the
default :

text/html html htm

--
Loïc Minier <lo...@via.ecp.fr>

Gary Johnson

unread,
Mar 21, 2002, 11:26:35 PM3/21/02
to
David Chapman <dcha...@canwest.victoria.bc.ca> wrote:
> I thought I had seen this here before, but I can't find it now.
>
> I'm looking for a way of stripping HTML from a message body and displaying
> just the message text in the mutt message body window.
>
> Not interested in spawning a browser to read the message.

This was just discussed on the mutt-users list. Someone there uses a
Perl script called stripmime to do this. I think you can get it from
freshmeat.net. I don't know exactly what it does, but if it doesn't
affect text/plain messages, you could just
'set display_filter=stripmime.pl' in your muttrc. Or you could instead
put in your mailcap file:

text/html; stripmime.pl; copiousoutput

and in your muttrc:

auto_view text/html

I've never used stripmime, so I'm not sure that the above actually
works as written, but it should be close.

That being said, you can use the formatting features of a text-based
browser to convert text/html without really "launching" the browser by
putting either of these in your mailcap:

text/html; lynx -force_html -dump %s; copiousoutput
text/html; w3m -T text/html -dump %s; copiousoutput

Either probably takes a little longer than stripmime, but it may not be
noticeable, depending on your system.

HTH,
Gary

Andreas Kneib

unread,
Mar 22, 2002, 2:47:28 AM3/22/02
to
* Gary Johnson [Fri, 22 Mar 2002 04:26:35 +0000 (UTC)]:


> text/html; stripmime.pl; copiousoutput
>
> and in your muttrc:
>
> auto_view text/html
>

My stripmime.pl works fine with procmail:


,----[ .procmailrc ]-
|
| :0 f
| * ^Content-Type:.*text/html
| * !^From:.*Kneib
| | /usr/local/bin/stripmime.pl
|
`----

Bye,
Andreas


David Chapman

unread,
Mar 22, 2002, 4:28:15 AM3/22/02
to
On Fri, 22 Mar 2002 04:26:35 +0000 (UTC), Gary Johnson
<gary...@spk.agilent.com> wrote:
> 'set display_filter=stripmime.pl' in your muttrc. Or you could instead
> put in your mailcap file:
>
> text/html; stripmime.pl; copiousoutput
>
> and in your muttrc:
>
> auto_view text/html
>
I could not get this to work. The display showed that the autoview was done
with stripmime.pl, but the message body still had copious html in it.

> text/html; lynx -force_html -dump %s; copiousoutput
>

This works quite well, thanks.

Thanks to all who replied,

David Chapman

unread,
Mar 22, 2002, 4:36:51 AM3/22/02
to
On Fri, 22 Mar 2002 08:47:28 +0100, Andreas Kneib <akn...@gmx.net> wrote:

> * Gary Johnson [Fri, 22 Mar 2002 04:26:35 +0000 (UTC)]:
>

> My stripmime.pl works fine with procmail:
>

I look at procmail and my eyes roll back in my head and I loose conciousness.
It is beyond my simple little mind....

I was going to use it for filtering mail but was unsucessfull.

David Chapman

unread,
Mar 22, 2002, 4:53:08 AM3/22/02
to
On Fri, 22 Mar 2002 08:04:02 +0000 (UTC), Rocco Rutte
<s111...@mail.inf.tu-dresden.de> wrote:

> Sounds interesting, but it there's lots of work still be done. Right now
> I suggest ...


>
>> text/html; lynx -force_html -dump %s; copiousoutput
>> text/html; w3m -T text/html -dump %s; copiousoutput
>

> ... one more: links. Reason: lynx doesn't handle frames very well and
> w3m is not perfect at wrapping lines and producing well formatted text.
> For a simple dump links seems perfect.
>
Is the line for links the same as the line for lynx? ie. Do I just substitute
links for lynx?

Will this also work for elm?

Thanks,

Alexander Wasmuth

unread,
Mar 22, 2002, 5:04:18 AM3/22/02
to
David Chapman <dcha...@canwest.victoria.bc.ca> wrote:

>> Sounds interesting, but it there's lots of work still be done. Right now
>> I suggest ...
>>
>>> text/html; lynx -force_html -dump %s; copiousoutput
>>> text/html; w3m -T text/html -dump %s; copiousoutput
>>
>> ... one more: links. Reason: lynx doesn't handle frames very well and
>> w3m is not perfect at wrapping lines and producing well formatted text.
>> For a simple dump links seems perfect.
>>
> Is the line for links the same as the line for lynx? ie. Do I just substitute
> links for lynx?

"-dump" is not mentioned in the manpage (Links current [Jul 29 2000
20:21:31]), but nevertheless it seems to be working, so try:

text/html; links -dump %s; copiousoutput

It seems that "-force_html" and "-T" are not supported, but I don't know
that much about links.

Alex
--
Alexander Wasmuth http://alexander.wasmuth.org/

David Chapman

unread,
Mar 22, 2002, 5:27:14 AM3/22/02
to
On Fri, 22 Mar 2002 11:04:18 +0100, Alexander Wasmuth <alex...@wasmuth.org>
wrote:

> text/html; links -dump %s; copiousoutput
>

Works great, it seems.

Thansk,

David Chapman

unread,
Mar 22, 2002, 7:02:37 AM3/22/02
to
On Fri, 22 Mar 2002 11:08:00 +0000 (UTC), Rocco Rutte
<s111...@mail.inf.tu-dresden.de> wrote:

> * David Chapman wrote:
>> On Fri, 22 Mar 2002 11:04:18 +0100, Alexander Wasmuth <alex...@wasmuth.org>

>> > text/html; links -dump %s; copiousoutput
>> Works great, it seems.
>

> The (mutt, IIRC) manual also suggests putting 'templatename=%s.html' at
> the end of the mailcap entry. I don't exactly know what id does but I
> guess it always appends '.html' to the filename so that the browser may
> guess what the input is.
>
Looks like it names the temporary file to mutt.html.

> So according to the manual I have:
>
> 'text/html; links -dump %s; copiousoutput; nametemplate=%s.html'
>
> in my ~/.mailcap. No negative changes recognized.
>
Works for me too! Thanks!

Michael P. Reilly

unread,
Mar 22, 2002, 11:21:20 PM3/22/02
to
In article <POCm8.338154$A44.18...@news2.calgary.shaw.ca>, David Chapman wrote:
> On Fri, 22 Mar 2002 04:26:35 +0000 (UTC), Gary Johnson
> <gary...@spk.agilent.com> wrote:
> > 'set display_filter=stripmime.pl' in your muttrc. Or you could instead
> > put in your mailcap file:
> >
> > text/html; stripmime.pl; copiousoutput
> >
> > and in your muttrc:
> >
> > auto_view text/html
> >
> I could not get this to work. The display showed that the autoview was done
> with stripmime.pl, but the message body still had copious html in it.
>
> > text/html; lynx -force_html -dump %s; copiousoutput
> >
> This works quite well, thanks.
>
> Thanks to all who replied,

You might also want to set "alternative_order" in addition to auto_view.
Auto_view controls what you can see through the pager, alternative_order
controls which part takes precedence with multipart/mixed.

-Arcege

--
+----------------------------------+-----------------------------------+
| Michael P. Reilly | arc...@speakeasy.net |

Christopher Jensen

unread,
Mar 22, 2002, 11:52:22 PM3/22/02
to
Hi, I have related question. Is is possible, once the text has been
extracted from the HTML, to use it inside a reply. I receive lots mail
that have HTML bodies, and I could never reply to the message with the
HTML. Any ideas?

Regards,

Chris


On Fri, 22 Mar 2002 06:30:10 GMT, David Chapman
<dcha...@canwest.victoria.bc.ca> wrote:

David Chapman

unread,
Mar 23, 2002, 4:48:47 AM3/23/02
to
On Sat, 23 Mar 2002 04:21:20 -0000, Michael P. Reilly
<arc...@golem.speakeasy.net> wrote:

> You might also want to set "alternative_order" in addition to auto_view.
>

Thanks for the tip. Do I just put the statement "alternative_order" in my
.muttrc?

Thanks again,

Will Yardley

unread,
Mar 23, 2002, 5:14:39 AM3/23/02
to
In article <3cYm8.195699$kb.10...@news1.calgary.shaw.ca>, David

Chapman wrote:
> On Sat, 23 Mar 2002 04:21:20 -0000, Michael P. Reilly
><arc...@golem.speakeasy.net> wrote:

>> You might also want to set "alternative_order" in addition to
>> auto_view.

> Thanks for the tip. Do I just put the statement "alternative_order"
> in my .muttrc?

i use:

# view annoying html mail inline
auto_view text/html
# if plain text and html prefer plain text
alternative_order text/plain text/enriched text/html

--
No copies, please.
To reply privately, simply reply; don't remove anything.

Loïc Minier

unread,
Mar 23, 2002, 7:50:01 AM3/23/02
to
On Sat, 23 Mar 2002 10:37:28 +0000 (UTC),
Rocco Rutte <s111...@mail.inf.tu-dresden.de> wrote:
> It's really a pitty that links/w3m/lynx cannot read from stdin so that
> someone could use it within procmail.

« cat index.html | w3m -T text/html -dump »
« cat index.html | lynx -force_html -dump »

work perfectly here. I don't want my mail to be too heavily changed by
the procmail. I just make minor PGP conversions, but I don't alter
anything else. But you could write a procmail line piping to your w3m or
lynx to transform the html mails in pure plaintext ones.

I did not manage to use links in a pure pipe, but it can read from the
/dev/stdin if this file isn't empty at startup.

--
Loïc Minier <lo...@via.ecp.fr>

Rocco Rutte

unread,
Mar 23, 2002, 11:08:44 AM3/23/02
to
* Loďc Minier wrote:
>On Sat, 23 Mar 2002 10:37:28 +0000 (UTC),
>Rocco Rutte <s111...@mail.inf.tu-dresden.de> wrote:
>> It's really a pitty that links/w3m/lynx cannot read from stdin so that
>> someone could use it within procmail.
>
> « cat index.html | w3m -T text/html -dump »
> « cat index.html | lynx -force_html -dump »
>
> work perfectly here.

Here too. Except lynx, but I don't want that. Didn't look at the
'examples' section of the w3m manpage.

Anyways, thanks.

Rocco

Bernd Nawothnig

unread,
Mar 23, 2002, 2:30:15 PM3/23/02
to
Hi Rocco,

On Sat, 23 Mar 2002 (UTC), Rocco Rutte <s111...@mail.inf.tu-dresden.de>
wrote:

>> « cat index.html | w3m -T text/html -dump »


>> « cat index.html | lynx -force_html -dump »

>> work perfectly here.

> Here too. Except lynx,

#v+

#!/usr/bin/perl

open (O, ">tmp");
for (<>) {
print O $_;
}
close (O);
system ("lynx -force_html -dump tmp");

#v-


reads stdin and writes to stdout using lynx.

HTH

Bernd

Gary Johnson

unread,
Mar 23, 2002, 12:06:51 PM3/23/02
to
Rocco Rutte <s111...@mail.inf.tu-dresden.de> wrote:

> Also, I received a mail with one text/plain part saying that everything
> else is within the HTML attachement of the mail. That attachement is, in
> fact, HTML but declared as 'Content-Type: application/octet-stream' with
> base64-encoding.
>
> So my conclusion is that there's yet no general solution of how to
> render all HTML stuff within mails to plain text. If somebody has a
> generally working one, please post a URL!

Some people do. See for example,

http://www.spocom.com/users/gjohnson/mutt/

It works really well for me.

HTH,
Gary

Gary Johnson

unread,
Mar 23, 2002, 12:10:23 PM3/23/02
to
Michael P. Reilly <arc...@golem.speakeasy.net> wrote:

> You might also want to set "alternative_order" in addition to auto_view.
> Auto_view controls what you can see through the pager, alternative_order
> controls which part takes precedence with multipart/mixed.

I think you meant multipart/alternative. You should see all parts of
multipart/mixed that the pager can render.

Gary

Bernd Nawothnig

unread,
Mar 23, 2002, 5:22:01 PM3/23/02
to
Hi Rocco,

On Sat, 23 Mar 2002 (UTC), Rocco Rutte <s111...@mail.inf.tu-dresden.de>
wrote:

>> reads stdin and writes to stdout using lynx.

> And needs a temporary file (which is not deleted). I want something
> working 'on the fly'. I'm working on it.

Verion 2:

#v+

$tmp = "/tmp/lynx_tmp.html"; # or a better temp name

open (O, ">$tmp");


for (<>) {
print O $_;
}
close (O);

system ("lynx -dump $tmp");
system ("rm $tmp");

#v-

Bernd

--

I've been around so long, I knew Doris Day before she was a virgin.
[Groucho Marx]

Bernd Nawothnig

unread,
Mar 25, 2002, 6:31:14 AM3/25/02
to
Hi Rocco,

On Sun, 24 Mar 2002 (UTC), Rocco Rutte <s111...@mail.inf.tu-dresden.de>
wrote:

> #!/bin/bash

For bashists only ;-)

> file="/tmp"`date "+%Y%m%d%S"`-`id -u`.html

Ok, that's better.

> ( tee > "$file" ) && lynx -dump "$file"

tee - I was searching for something like that :)


Bernd

--

She got her good looks from her father. He's a plastic surgeon. [Groucho Marx]

John Wingate

unread,
Mar 25, 2002, 8:39:06 AM3/25/02
to
Rocco Rutte <s111...@mail.inf.tu-dresden.de> wrote:
> Allthough I probably won't get any answer here, does someone know why
> the tool 'tee' is called 'tee'?

Like pipes connecting standard output to standard input, the name
arises from a plumbing analogy--in this case, a tee-junction, shaped
like the letter "T". "tee file" copies stdin to stdout, and also copies
it to file:

stdin---+---stdout
|
|
file
--
John Wingate Learning facts takes valuable time that
joh...@worldpath.net could be better spent developing biases.
--Richard Maine

those who know me have no need of my name

unread,
Mar 25, 2002, 9:24:50 AM3/25/02
to
<20020325121...@klaus.daprodeges.dyndns.org> divulged:

>Allthough I probably won't get any answer here, does someone know why
>the tool 'tee' is called 'tee'?

it create a `tee' (joint) in a pipeline, i.e., data continues to flow
through the pipe but is also diverted (copied actually) to a file.

"Jargon File 4.3.1"
tee n.,vt.
[Purdue] A carbon copy of an electronic transmission. "Oh, you're sending
him the bits to that? Slap on a tee for me." From the Unix command
tee(1), itself named after a pipe fitting (see plumbing). Can also mean
`save one for me', as in "Tee a slice for me!" Also spelled `T'.

"Webster's Revised Unabridged Dictionary (1913)"
Tee \Tee\, n.
A short piece of pipe having a lateral outlet, used to
connect a line of pipe with a pipe at a right angle with the
line; -- so called because it resembles the letter {T} in
shape.

--
bringing you boring signatures for 17 years

Doug Morse

unread,
Mar 25, 2002, 11:04:21 AM3/25/02
to
Hi,

It seems that this topic comes up a lot, namely, to be able to view
and/or convert HTML attachments to plain text. That said, is this
then perhaps something that should be directly incorporated into mutt?
In other words, enhance the built-in pager to display "HTML-enhanced"
messages? Add a feature to convert a text/html part of a multi-part
message to a text/plain part (i.e,. change the encoding info, change
the message context, etc.)? Seems like this is a core set of features
that a lot of Unix-types are wanting, and that could potentially be
used across projects (i.e., not only within mutt, but also within say
procmail, etc.). All that said, anyone have any thoughts on what
would need to be the key features of such a solution?

Cheers!
Doug

Will Yardley

unread,
Mar 25, 2002, 11:28:33 AM3/25/02
to
In article <slrna9uik6...@learn.ltc.vanderbilt.edu>, Doug Morse
wrote:


> It seems that this topic comes up a lot, namely, to be able to view
> and/or convert HTML attachments to plain text. That said, is this
> then perhaps something that should be directly incorporated into mutt?

heresy! i don't personally have an opinion on one side or the other of
this matter, but i think it's a pretty good bet that this won't be
incorporated into mutt.

there are a couple issues here.

1) mutt is a mail client not a web browser. thus why should it need to
read html (and yes, i do realize that the world is full of people who
send this crap, but anyway.... i digress). mutt's whole idea is to be
lightweight and use the *nix philosophy of doing one thing well.

2) currently, using links, lynx, or w3m, or other filters / programs (i
use w3m, personally) works adequately to display html mail in a text
format.

3) further, most people prefer one or the other of these tools. most
likely, a builtin html viewer would be unlikely to please everyone.

4) i generally find that:
alternative_order text/plain text/enriched text/html
takes care of *most* legitimate mail, since most (non-spam) html mail
also has a plain text version.

> Add a feature to convert a text/html part of a multi-part message to a
> text/plain part (i.e,. change the encoding info, change the message
> context, etc.)?

well you could very likely modify a tool like demime or stripmime to do
something like this (and there may well already be such a tool), and
pipe your mail through that before you receive it. i would certainly be
interested in a tool like this, although personally i prefer to have the
message delivered as is - or at least for a copy of the original to be
kept.

for example, if you're making a spam complaint, you may very well want
to have the original message source, both to determine where to send the
LART, and to forward the "original" message along.

Gregor Zattler

unread,
Mar 25, 2002, 10:29:02 AM3/25/02
to
Rocco Rutte <s111...@mail.inf.tu-dresden.de> wrote:

* Bernd Nawothnig wrote:
>> On Sun, 24 Mar 2002 (UTC), Rocco Rutte <s111...@mail.inf.tu-dresden.de>
>> wrote:
>> > ( tee > "$file" ) && lynx -dump "$file"
[...]
>
> And usually there're meaningful names such as 'at', 'kill' or 'uptime'.
> But in the case of 'tee' I really have no idea where the name may come
> from (my only idea has something to do with tee, input and output ...
> ... but I don't want to discuss that one here ;-).

>
> Allthough I probably won't get any answer here, does someone know why
> the tool 'tee' is called 'tee'?

You use pipes so think of "tee" as an t-shaped adpter

---- tee ----
|
|

Ciao Gregor

Bernd Nawothnig

unread,
Mar 25, 2002, 1:28:03 PM3/25/02
to
Hi Rocco,

On Mon, 25 Mar 2002 (UTC), Rocco Rutte <s111...@mail.inf.tu-dresden.de>
wrote:

>>> ( tee > "$file" ) && lynx -dump "$file"

>> tee - I was searching for something like that :)

> You know, this is (except in your case when writing ;-) Unix. And in
> Unix there is - as far as I experienced - nothing which is not yet
> implemented (basic things, of course).

In 4dos it's built in too. From the online help (4dos 6.01) to the
internal command 'y':


Purpose: Copy standard input to standard output, and then copy the
specified file(s) to standard output.

Format: Y file ...

file: The file or list of files to send to standard output.

See also: TEE.


But I never used it before and so I didn't remember.

Just another argument against command.com :-)

4dos has many features copied from bash & co (history [with additional
pop up window for selecting from and editing the history: pg-up], true
command line editing, grouping, tab-expansion, lots of usefull functions
etc.)

> And usually there're meaningful names such as 'at', 'kill' or 'uptime'.

yes - and 'cp', 'mv', 'ls', 'grep', 'less' or 'most' :-)

> Allthough I probably won't get any answer here, does someone know why
> the tool 'tee' is called 'tee'?

Sorry, no idea.

Bernd

Bernd Nawothnig

unread,
Mar 25, 2002, 5:41:55 PM3/25/02
to
Hi Rocco,

On Mon, 25 Mar 2002 (UTC), Rocco Rutte <s111...@mail.inf.tu-dresden.de>
wrote:

> I don't know what '4dos' is but I think some 'command.com' replacement.

Yes.

Bernd

0 new messages