I get a HeaderParseError during decode_header(), but Thunderbird can
display the name.
>>> from email.header import decode_header
>>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
raise HeaderParseError
email.errors.HeaderParseError
How can I parse this in Python?
Thomas
Same question on Stackoverflow:
http://stackoverflow.com/questions/6568596/headerparseerror-in-python
--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de
> I get a HeaderParseError during decode_header(), but Thunderbird can
> display the name.
>
>>>> from email.header import decode_header
>>>>
decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
> raise HeaderParseError
> email.errors.HeaderParseError
>
>
> How can I parse this in Python?
Trying to decode as much as possible:
>>> s = "QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?="
>>> for n in range(len(s), 0, -1):
... try: t = s[:n].decode("base64")
... except: pass
... else: break
...
>>> n, t
(49, 'Anmeldung Netzanschluss S\x19\x1c\x9a[\x99\xcc\xdc\x0b\x9a\x9c\x19')
>>> print t.decode("iso-8859-1")
Anmeldung Netzanschluss S[ÌÜ
>>> s[n:]
'w==?='
The characters after "...Netzanschluss " look like garbage. What does
Thunderbird display?
Hi Peter, Thunderbird shows this:
Anmeldung Netzanschluss Südring3p.jpg
Thomas
> On 04.07.2011 11:51, Peter Otten wrote:
>> Thomas Guettler wrote:
>>
>>> I get a HeaderParseError during decode_header(), but Thunderbird can
>>> display the name.
>>>
>>>>>> from email.header import decode_header
>>>>>>
>>
decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> File "/usr/lib64/python2.6/email/header.py", line 101, in
>>> decode_header
>>> raise HeaderParseError
>>> email.errors.HeaderParseError
>> The characters after "...Netzanschluss " look like garbage. What does
>> Thunderbird display?
>
> Hi Peter, Thunderbird shows this:
>
> Anmeldung Netzanschluss Südring3p.jpg
>>> a = u"Anmeldung Netzanschluss
Südring3p.jpg".encode("iso-8859-1").encode("base64")
>>> b = "QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?="
>>> for i, (x, y) in enumerate(zip(a, b)):
... if x != y: print i, x, y
...
33 / _
52
?
>>> b.decode("base64")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/encodings/base64_codec.py", line 42, in
base64_decode
output = base64.decodestring(input)
File "/usr/lib/python2.6/base64.py", line 321, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
>>> b.replace("_", "/").decode("base64")
'Anmeldung Netzanschluss S\xfcdring3p.jpg'
Looks like you encountered a variant of base64 that uses "_" instead of "/"
for chr(63). The wikipedia page http://en.wikipedia.org/wiki/Base64
calls that base64url.
You could try and make the email package accept that with a monkey patch
like the following:
#untested
import binascii
def a2b_base64(s):
return binascii.a2b_base64(s.replace("_", "/"))
from email import base64mime
base64mime.a2b_base64 = a2b_base64
Alternatively monkey-patch the binascii module before you import the email
package.
Hi,
I created a ticket: http://bugs.python.org/issue12489
Thomas Güttler