Retrieving email attachments using the email input node

1,725 views
Skip to first unread message

Neil Kolban

unread,
Mar 5, 2016, 7:35:50 PM3/5/16
to Node-RED
My goal is to receive incoming emails which may include attachments and process those attachments.  I sent an email to my mail server which is being monitored by an Email input node.  I dumped the resulting message and am not understanding what I am seeing:

{ from: '"Neil Kolban" <kol...@test.com>',
  topic: 'Hello',
  date: 'Sat, 5 Mar 2016 18:25:53 -0600',
  header:
   { 'return-path': [ 'kol...@test.com' ],
     received: [ 'from win7x64 (win7-x64 [127.0.0.1]) by WIN7-X64 ; Sat, 5 Mar 2016 18:25:53 -0600' ],
     'message-id': [ '<4623E5766C51464D86158BD396C21422@win7x64>' ],
     from: [ '"Neil Kolban" <kol...@test.com>' ],
     to: [ '"Neil Kolban" <kol...@test.com>' ],
     subject: [ 'Hello' ],
     date: [ 'Sat, 5 Mar 2016 18:25:53 -0600' ],
     'mime-version': [ '1.0' ],
     'content-type': [ 'multipart/mixed; boundary="----=_NextPart_000_0057_01D1770C.73AA2B60"' ],
     'x-priority': [ '3' ],
     'x-msmail-priority': [ 'Normal' ],
     importance: [ 'Normal' ],
     'x-mailer': [ 'Microsoft Windows Live Mail 16.4.3564.1216' ],
     'x-mimeole': [ 'Produced By Microsoft MimeOLE V16.4.3564.1216' ] },
  payload: 'This is a multi-part message in MIME format.\r\n\r\n------=_NextPart_000_0057_01D1770C.73AA2B60\r\n',
  _msgid: '4e336706.b1cc98' }


My email had a simple text payload ("Hello world") and an attachment ... I can see neither the email body nor the attachment ... I'm going to go and study more ... but if anyone has any thoughts, they are more than welcome.

Julian Knight

unread,
Mar 5, 2016, 7:57:00 PM3/5/16
to Node-RED
Hi Neil,

Multipart messages are created when the email content is more than plain text. From the output you've listed, the node is not returning the whole message only the first part of the body is appearing in the payload.

Try sending an HTML formatted email and see what you get. If you only get the plain text part, the node is ignoring the remaining parts of the message. If you see both plain text and html text, the node is ignoring attachments.


On Sunday, 6 March 2016 00:35:50 UTC, Neil Kolban wrote:
My goal is to receive incoming emails which may include attachments and process those attachments.  I sent an email to my mail server which is being monitored by an Email input node.  I dumped the resulting message and am not understanding what I am seeing:

Neil Kolban

unread,
Mar 5, 2016, 11:38:52 PM3/5/16
to Node-RED
I had a good hard look through the logic in our current implementation found here:


and then had a good hard study of the node-imap package found here:


and then wrote some test Node.js applications just using node-imap and an email server/client.

Although this whole area is new to me, I am not convinced what we have in this node is what we should have.  Looking at node-imap, we can ask for a "layout" (structure) of the body of the message.  It is either single part or multi-part.  In either case, from the structure of the message presented to us, we can determine what parts are what.  For any given part of an email we are told its MIME type (type and subtype), encoding and length.

Our current implementation ignores this structure data returned from "node-imap" and instead tries to perform manual string manipulation against the body as a whole by splitting text on "Content-Type".

What I'd like to suggest is a rework of the core structure processing of the email message using the structure map supplied to us by "node-imap".  However, this will lead us into potential semantics changes.

Imagine the following circumstances:

1. A plain text only email
2. An HTML text only email
3. A plain text email with one or more attachments
4. An HTML text email with one or more attachments

I believe we just about handle (1) and (2) today.   My reading seems to say that for (1), the plain text is placed in msg.payload.   We also seem to handle (2) with the HTML text being placed in msg.html (I'm not sure why we didn't put it in msg.payload for consistency).

For (3) and (4) ... things start to go wrong and I believe the code is broken in that if I send in a plain text email with 1 attachment, the "msg.payload" contains neither the plain text email body nor the attachment and the attachment is not to be found anywhere.

What I'd like to offer is the following re-design ... thought through to try and keep existing apps working.

(1) Plain text email only goes to msg.payload
(2) HTML text email only goes to msg.html (would personally have liked it to go to msg.payload ...)
(3) Multi-part emails (which will include the above with 1 or more attachments) will go to an array of msg.payload where each element in the array will be an object still to be designed that includes the MIME type, attachment properties (if any) and the payload of that part.

If we can reach agreement on the above in principle, I'll be happy to write up a more detailed design doc on how it could be implemented.  If we agree on that, I'll code it up on a Github branch and issue a pull request.
Message has been deleted

Mark Setrem

unread,
Mar 6, 2016, 3:07:25 AM3/6/16
to Node-RED
One reason for having plain text going to msg.payload and html going to msg.html is so that nodes further down the flow can easily differentiate between them.
Your suggested solution would appear to break this, as nodes downstream would each have to do a "is it an array or string" check on msg.payload.

It would also mean that the Email-in and the Email-out nodes used a different schema for the same constituent parts of an Email, ( the Email-out node handles attachments in the cunningly named msg.attachments)

I would hope that any rework in the nomenclature would be mirrored, where applicable, between the in and the out nodes and that anything that breaks the current functionality of the Email node goes through the same change process as occurred for the html nodes.

Julian Knight

unread,
Mar 6, 2016, 4:18:38 AM3/6/16
to Node-RED
I've written IMAP parsers in the dim and distant past when I discovered the hard way that the PHP IMAP implementation of the time handled things appallingly. Still my knowledge is extremely rusty.

I'll try to find some time to do some tests myself. I noted Neil that you seem to be using the Windows 10 Mail App to send the emails so it might be worth trying from some other mail clients too. Microsoft have, in the past, played fast and loose with the RFC's.

Dave C-J

unread,
Mar 6, 2016, 7:03:19 AM3/6/16
to node...@googlegroups.com
Neil,

yes the email nodes (both in and out) are well overdue a good looking at, so it would be great if you want to pick up the baton... There are a number of outstanding issues in the issue logs - In node-red-node project numbers - 123, 132 and your own 182 and in the main node-red project 641 in particular. All should ideally be thought about at least in the redesign.

As Mark points out the idea was that msg.payload should just be the text of the body - so even if an email came in in html style - (my understanding is that the email body should contain either the text from within the html or an alternative (a bit like img tags in html offer alternative text). - so leaving the msg.html to contain the full markup - but try to extract the main body text if no alternative is offered. So it's not really option 1 or 2 - but rather 1, or 1 and 2.

Likewise attachments should be consistent with the outbound - so ideally msg.attachments as an array, again leaving the payload to be just the body of the email text (as a string) - and msg.topic the subject line.

Finally, the existing node only retrieves the most recent one email... this has been crying out to be more configurable - one option being to get x emails (of course only sending on new ones... as today), and the other - to allow it to set the read flags and then only fetch unread (not so good if you "share" the inbox with a human.)

The other issue - is that nodemailer (for the out node) went through a massive refactor - splitting into several smaller pieces - so ideally that side would be updated and brought in line also...

but yes - A PR would be gratefully considered !

(Neil - as you are an IBMer the CLA process is slightly different... will contact you via email)
Reply all
Reply to author
Forward
0 new messages