Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#654680: python-html2text: 3.200.1-1 breaks parsing of feeds within rss2email

11 views
Skip to first unread message

Stefano Rivera

unread,
Jan 6, 2012, 8:10:01 AM1/6/12
to
tag 654680 + patch
thanks

Hi Joerg (2012.01.05_10:39:33_+0200)
> after python-html2text was upgraded to 3.200.1-1 the feeds read by
> rss2email can't be parsed anymore

Looks like the html2text upstream didn't consider unescape() to be part
of the public API, and moved it into a class-level function.
https://github.com/aaronsw/html2text/commit/1a25828d556d30cc689c1bc2c11f52838c57b7ac

I see it's also been marked with a "# @@nobody calls this function?"
comment. Aaron: Are you intending to remove it?

Joerg / Lindsey: The attached patch for rss2email should do the trick
for supporting 3.200.

The alternatives are:
* re-adding a top level unescape() function to
html2text, but that would have to create an HTML2Text object on each
invocation...
* Moving unscape() (and the functions it calls) back out of the class,
but then HTML2Text.unicode_snob won't be very useful.

SR

--
Stefano Rivera
http://tumbleweed.org.za/
H: +27 21 465 6908 C: +27 72 419 8559 UCT: x3127
html2text-3.200.patch

Debian Bug Tracking System

unread,
Jan 6, 2012, 8:10:02 AM1/6/12
to
Processing commands for con...@bugs.debian.org:

> tag 654680 + patch
Bug #654680 [python-html2text] python-html2text: 3.200.1-1 breaks parsing of feeds within rss2email
Added tag(s) patch.
> thanks
Stopping processing here.

Please contact me if you need assistance.
--
654680: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=654680
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems


--
To UNSUBSCRIBE, email to debian-bugs...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Aaron Swartz

unread,
Jan 6, 2012, 8:40:01 AM1/6/12
to
Here's the patch:

https://github.com/aaronsw/html2text/commit/d32885c1cd77a17625fe94299896385039373ae7

The @@ was a note to myself to check to see if anything used unescape
before I removed it. Obviously I forgot to do that.

Stefano Rivera

unread,
Jan 6, 2012, 8:40:02 AM1/6/12
to
Hi Jörg (2012.01.06_15:28:07_+0200)
> Applying this patch together with python-html2text 3.200 made r2e
> consume the complete memory of my system (3.5G).

Oops. But nevermind, Aaron, has re-added unescape(). New upload coming
shortly.

SR

--
Stefano Rivera
http://tumbleweed.org.za/
H: +27 21 465 6908 C: +27 72 419 8559 UCT: x3127



Debian Bug Tracking System

unread,
Jan 6, 2012, 9:00:02 AM1/6/12
to
Your message dated Fri, 06 Jan 2012 13:48:58 +0000
with message-id <E1RjAAE-...@franck.debian.org>
and subject line Bug#654680: fixed in python-html2text 3.200.2-1
has caused the Debian Bug report #654680,
regarding python-html2text: 3.200.1-1 breaks parsing of feeds within rss2email
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)
0 new messages