Incoming Mail Attachments

104 views
Skip to first unread message

Joshua Smith

unread,
Oct 19, 2009, 1:07:58 PM10/19/09
to google-a...@googlegroups.com
The attachments property on incoming mail is a list of pairs: name,
contents

(note that the docs say "attachments is a list of element pairs
containing file types and contents." which is not correct)

The contents are still encoded. For example:

From nobody Mon Oct 19 15:47:38 2009
content-transfer-encoding: base64

iVBORw0KGgoAAAANSUhEUgAAAaUAAAEUCAIAAAAEJr4pAAAXrGlDQ1BJQ0MgUHJvZmlsZQAAeAHt
...

Do I need to write something to parse this and deal with the different
possible values of content-transfer-encoding? Or is there a built-in
or an appengine API that will do that for me?

Also, when there are references to these attachments within the HTML,
such as
<img src="cid:b2ea907c-e731-43ff-a932-ba307782ba34" >
it appears that there is no way to know what the mapping is between
the cid's and the attachments.

This makes it impossible for me to display the email in a browser with
the images in-line.

-Joshua

Joshua Smith

unread,
Oct 19, 2009, 3:21:16 PM10/19/09
to google-a...@googlegroups.com
OK, I've made a little progress on this. It turns out that the
attachments member is using a EncodedPayload object to hold the
contents.

This has the delightful little function decode() which *almost* give
you the raw data. However, there is a line in mail.py that goes:

payload = payload.decode(self.encoding).lower()

why does it say "lower()" at the end? No idea, but it completely
trashes the data.

-Joshua

Rafe

unread,
Oct 19, 2009, 3:34:09 PM10/19/09
to Google App Engine
The content is in fact an instance of EncodedPayload. You can
decode the content by calling 'decode' on the EncodedPayload object.

Hope that this is what you were looking for.

- Rafe Kaplan

Joshua Smith

unread,
Oct 19, 2009, 3:41:19 PM10/19/09
to google-a...@googlegroups.com
Yes, it was, although there is apparently a bug in decode() in that it
lower-cases everything!

This bug affects not only attachments, but also HTML bodies!

Here's the work-around:

for a in message.attachments:
name = a[0]
payload = a[1].payload.decode(a[1].encoding)
... do something with name and payload ...

Joshua Smith

unread,
Oct 19, 2009, 4:02:31 PM10/19/09
to google-a...@googlegroups.com
I've reported the bug:


Here is a more complete work-around:

def goodDecode(encodedPayload):
  encoding = encodedPayload.encoding
  payload = encodedPayload.payload
  if encoding and encoding.lower() != '7bit':
    payload = payload.decode(encoding)
  return payload


usage:

    bodies = message.bodies(content_type='text/html')
    allBodies = "";
    for body in bodies:
      allBodies = allBodies + "\n" + goodDecode(body[1])

and:

      if hasattr(message, 'attachments'):
        for a in message.attachments:
          msg.attachmentNames.append(a[0])
          msg.attachmentContents.append(db.Blob(goodDecode(a[1])))
        msg.put()


On Oct 19, 2009, at 3:34 PM, Rafe wrote:

Joshua Smith

unread,
Oct 19, 2009, 4:12:48 PM10/19/09
to google-a...@googlegroups.com
I've created issue 2291 for the cid: problem.

On Oct 19, 2009, at 1:07 PM, Joshua Smith wrote:

Joshua Smith

unread,
Oct 20, 2009, 9:21:56 AM10/20/09
to google-a...@googlegroups.com
Got a notice that this was fixed.  I presume that means it will be fixed in the next SDK update.
Reply all
Reply to author
Forward
0 new messages