I find html2markdown the best value for mu4e-html2text-command

626 views
Skip to first unread message

Ævar Arnfjörð Bjarmason

unread,
Dec 24, 2015, 11:18:51 AM12/24/15
to mu-di...@googlegroups.com
It's part of the python-html2text package in Debian. I used to use:

w3m -dump -cols 80 -T text/html

And also experimented with the default:

html2text

But I found that both w3m and html2text would simply make hyperlinks
disappear, so the E-Mail will reference links in the text that I won't
see.

I set my config to:

(setq mu4e-html2text-command "html2markdown --body-width=0")

The lack of text-wrapping is because I've found that even with
html2markdown links that e.g. contain "-" will be wrapped across lines
which subsequently results in broken links.

Dirk-Jan C. Binnema

unread,
Dec 24, 2015, 12:21:45 PM12/24/15
to mu-di...@googlegroups.com

On Thursday Dec 24 2015, Ævar Arnfjörð Bjarmason wrote:

> It's part of the python-html2text package in Debian. I used to use:
>
> w3m -dump -cols 80 -T text/html
>
> And also experimented with the default:
>
> html2text
>
> But I found that both w3m and html2text would simply make hyperlinks
> disappear, so the E-Mail will reference links in the text that I won't
> see.

Yeah, html2text is rather limited... it's the default since it's emacs'
default.

> I set my config to:
>
> (setq mu4e-html2text-command "html2markdown --body-width=0")
>
> The lack of text-wrapping is because I've found that even with
> html2markdown links that e.g. contain "-" will be wrapped across lines
> which subsequently results in broken links.

I've found shr (eww) pretty good - it's part of emacs as well:

(setq mu4e-html2text-command 'mu4e-shr2text'

and, depending on your theme, you might want to set:

;; make shr/eww readable with dark themes
(setq shr-color-visible-luminance-min 80)

Hyvää joulua,
Dirk.

--
Dirk-Jan C. Binnema Helsinki, Finland
e:dj...@djcbsoftware.nl w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C

Igor Sosa Mayor

unread,
Dec 27, 2015, 5:14:57 AM12/27/15
to mu-di...@googlegroups.com
Dirk-Jan C. Binnema <dj...@djcbsoftware.nl>
writes:

> On Thursday Dec 24 2015, Ęvar Arnfjörš Bjarmason wrote:
>
>> It's part of the python-html2text package in Debian. I used to use:
>>
>> w3m -dump -cols 80 -T text/html
>>
>> And also experimented with the default:
>>
>> html2text
>>
>> But I found that both w3m and html2text would simply make hyperlinks
>> disappear, so the E-Mail will reference links in the text that I won't
>> see.
>
> Yeah, html2text is rather limited... it's the default since it's emacs'
> default.
>
>> I set my config to:
>>
>> (setq mu4e-html2text-command "html2markdown --body-width=0")
>>
>> The lack of text-wrapping is because I've found that even with
>> html2markdown links that e.g. contain "-" will be wrapped across lines
>> which subsequently results in broken links.
>
> I've found shr (eww) pretty good - it's part of emacs as well:
>
> (setq mu4e-html2text-command 'mu4e-shr2text'

there is obviosuly a small error in this line... could you maybe exactly
provide what your config is?

Many thanks in advance.


--
:: Igor Sosa Mayor :: joseleop...@gmail.com ::
:: GnuPG: 0x1C1E2890 :: http://www.gnupg.org/ ::
:: jabberid: rogorido :: ::

Dirk-Jan C. Binnema

unread,
Dec 27, 2015, 5:25:54 AM12/27/15
to mu-di...@googlegroups.com

On Sunday Dec 27 2015, Igor Sosa Mayor wrote:

>> (setq mu4e-html2text-command 'mu4e-shr2text'

> there is obviosuly a small error in this line... could you maybe exactly
> provide what your config is?

http://www.djcbsoftware.nl/code/mu/mu4e/Displaying-rich_002dtext-messages.html

Igor Sosa Mayor

unread,
Dec 27, 2015, 6:03:57 AM12/27/15
to mu-di...@googlegroups.com
Dirk-Jan C. Binnema <dj...@djcbsoftware.nl>
writes:

> On Sunday Dec 27 2015, Igor Sosa Mayor wrote:
>
>>> (setq mu4e-html2text-command 'mu4e-shr2text'
>
>> there is obviosuly a small error in this line... could you maybe exactly
>> provide what your config is?
>
> http://www.djcbsoftware.nl/code/mu/mu4e/Displaying-rich_002dtext-messages.html

thanks. I was confused since I could not find mu4e-shr2-text, but I see
it's declared in mu4e-contrib.

(btw: thanks for mu!)

Eduardo Mercovich

unread,
Jan 25, 2016, 11:35:16 AM1/25/16
to mu-di...@googlegroups.com
Dear Ævar.

> [...] But I found that both w3m and html2text would simply make hyperlinks
> disappear, so the E-Mail will reference links in the text that I won't
> see.
> I set my config to:
> (setq mu4e-html2text-command "html2markdown --body-width=0")

Thank you very much for your tip. It works beatifully for me and now I
see the link URLs. Thank you again... :)

This solution raises some small questions that may -hopefully- be
generalized and useful for others.

In specific terms:

+ there are a lot of "&nbsp_place_holder;" in the resulting conversion.
A simple search & replace for a space is enough.

+ many links appear duplicated (url and text are the same). Again, a
simple search & replace to eliminate the 2nd is enough.

So, the question is: where can we add some post-processing to the
html to text conversion?

Or more generally: does any other of you finds this need? Does
it make sense to generalize this post-processing?

Thanks a lot for your attention... :)

Best.


--
eduardo mercovich

Donde se cruzan tus talentos
con las necesidades del mundo,
ahí está tu vocación. (Aristóteles)

Olaf Meeuwissen

unread,
Jan 25, 2016, 6:09:38 PM1/25/16
to mu-di...@googlegroups.com

Eduardo Mercovich writes:

> Dear Ævar.
>
>> [...] But I found that both w3m and html2text would simply make hyperlinks
>> disappear, so the E-Mail will reference links in the text that I won't
>> see.
>> I set my config to:
>> (setq mu4e-html2text-command "html2markdown --body-width=0")
>
> Thank you very much for your tip. It works beatifully for me and now I
> see the link URLs. Thank you again... :)
>
> This solution raises some small questions that may -hopefully- be
> generalized and useful for others.
>
> In specific terms:
>
> + there are a lot of "&nbsp_place_holder;" in the resulting conversion.
> A simple search & replace for a space is enough.
>
> + many links appear duplicated (url and text are the same). Again, a
> simple search & replace to eliminate the 2nd is enough.
>
> So, the question is: where can we add some post-processing to the
> html to text conversion?
>
> Or more generally: does any other of you finds this need? Does
> it make sense to generalize this post-processing?

I also ran into the &nbsp_place_holder; issue. It is put in the output
by html2markdown on purpose (but I forgot why). I haven't figured out
how to control this on the html2markdown side, so just put this in my
.emacs.d/init.el:

(setq mu4e-html2text-command "html2markdown --body-width=0 | sed \"s/&nbsp_place_holder;/ /g; /^$/d\"")

Note, this also gets rid of a lot of extra empty lines that bothered me.

For more extensive post-processing I would use a custom script (in
$HOME/bin/ ;-) and make that the mu4e-html2text-command.

> Thanks a lot for your attention... :)

Hope this helps,
--
Olaf Meeuwissen, LPIC-2 FLOSS Engineer -- EPSON AVASYS CORPORATION
Free Software Foundation Associate Member since 2004-01-27
Support Free Software https://my.fsf.org/donate
Join the Free Software Foundation https://my.fsf.org/join

Eduardo Mercovich

unread,
Feb 8, 2016, 7:46:25 AM2/8/16
to mu-di...@googlegroups.com
Dear Olaf.

Sorry the delay, I've been offline for some time.

[...]
>> + there are a lot of "&nbsp_place_holder;" in the resulting conversion.
[...]

> I also ran into the &nbsp_place_holder; issue. It is put in the output
> by html2markdown on purpose (but I forgot why). I haven't figured out
> how to control this on the html2markdown side, so just put this in my
> .emacs.d/init.el:

> (setq mu4e-html2text-command "html2markdown --body-width=0 | sed \"s/&nbsp_place_holder;//g; /^$/d\"")

Thanks a lot, this really cleaned these emails (and my eyes too). ;)

Best...

Dirk-Jan C. Binnema

unread,
Feb 8, 2016, 2:05:03 PM2/8/16
to mu-di...@googlegroups.com

On Monday Feb 08 2016, Eduardo Mercovich wrote:

> Dear Olaf.
>
> Sorry the delay, I've been offline for some time.
>
> [...]
>>> + there are a lot of "&nbsp_place_holder;" in the resulting conversion.
> [...]
>
>> I also ran into the &nbsp_place_holder; issue. It is put in the output
>> by html2markdown on purpose (but I forgot why). I haven't figured out
>> how to control this on the html2markdown side, so just put this in my
>> .emacs.d/init.el:
>
>> (setq mu4e-html2text-command "html2markdown --body-width=0 | sed \"s/&nbsp_place_holder;//g; /^$/d\"")
>
> Thanks a lot, this really cleaned these emails (and my eyes too). ;)

Note, for people using emacs 24.4 or later, shr is a nice (built-in)
renderer; in 0.9.16:
--8<---------------cut here---------------start------------->8---
(require 'mu4e-contrib)
(setq mu4e-html2text-command 'mu4e-shr2text)
--8<---------------cut here---------------end--------------->8---

and in current git builds, it's the default.

Kind regards,
Dirk.

Eduardo Mercovich

unread,
Feb 29, 2016, 10:08:05 AM2/29/16
to mu-di...@googlegroups.com
Hi Dirk.

[...]
> Note, for people using emacs 24.4 or later, shr is a nice (built-in)
> renderer; in 0.9.16:
> --8<---------------cut here---------------start------------->8---
> (require 'mu4e-contrib)
> (setq mu4e-html2text-command 'mu4e-shr2text)
> --8<---------------cut here---------------end--------------->8---

I just tried it with 0.9.17 and it works great, thank you.

Just one doubt: I see some links and it renders the link text (it's ok),
but I can't see the URL, or copy it with the default keybinding "k". Or
open a link in the browser with "g".

What am I missing here?

Thanks a lot... :)
Reply all
Reply to author
Forward
0 new messages