Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Reading Outlook .msg file using Python

2,881 views
Skip to first unread message

John Henry

unread,
Oct 10, 2010, 5:51:38 PM10/10/10
to
Hello all:

I have a need to read .msg files exported from Outlook. Google search
came out with a few very old posts about the topic but nothing really
useful. The email module in Python is no help - everything comes back
blank and it can't even see if there are attachments. Did find a Java
library to do the job and I suppose when push come to shove, I would
have to learn Jython and see if it can invoke the Java library. But
before I do that, can somebody point me to a Python/COM solution?

I don't need to gain access to Exchange or Outlook. I just need to
read the .msg file and extract information + attachments from it.

Thanks,

Lawrence D'Oliveiro

unread,
Oct 10, 2010, 11:27:29 PM10/10/10
to
In message
<f4cdb6df-f191-4635...@k15g2000pre.googlegroups.com>, John
Henry wrote:

> I have a need to read .msg files exported from Outlook.

Try using EML format instead. That’s plain text.

John Henry

unread,
Oct 11, 2010, 12:02:52 AM10/11/10
to
On Oct 10, 8:27 pm, Lawrence D'Oliveiro <l...@geek-
central.gen.new_zealand> wrote:
> In message
> <f4cdb6df-f191-4635-ab2f-c3431069f...@k15g2000pre.googlegroups.com>, John

>
> Henry wrote:
> > I have a need to read .msg files exported from Outlook.
>
> Try using EML format instead. That’s plain text.

Thanks for the reply. I would have to check to see if my client's
Outlook can export in EML format directly. I don't want to use a
converter.

Tim Golden

unread,
Oct 11, 2010, 6:56:01 AM10/11/10
to pytho...@python.org

.msg files are Compound Documents -- a file format which obviously
seemed like a jolly good idea at the time, but which frustrates
me every time I have to do anything with it :)

Hopefully this code snippet will get you going. The idea is to open
the compound document using the Structured Storage API. That gives
you an IStorage-ish object which you can then convert to an IMessage-ish
object with the convenience function OpenIMsgOnIStg. At that point you
enter the marvellous world of Extended MAPI. The get_body_from_stream
function does a Q&D job of pulling the body text out. You can get
attachments as well: look at the PyIMessage docs, but come back if
you need help with that:

<code>
import os, sys

from win32com.mapi import mapi, mapitags
from win32com.shell import shell, shellcon
from win32com.storagecon import *
import pythoncom

def get_body_from_stream (message):
CHUNK_SIZE = 10000
stream = message.OpenProperty (mapitags.PR_BODY,
pythoncom.IID_IStream, 0, 0)
text = ""
while True:
bytes = stream.read (CHUNK_SIZE)
if bytes:
text += bytes
else:
break
return text.decode ("utf16")

def main (filepath):
mapi.MAPIInitialize ((mapi.MAPI_INIT_VERSION, 0))
storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_EXCLUSIVE
storage = pythoncom.StgOpenStorage (filepath, None, storage_flags,
None, 0)
mapi_session = mapi.OpenIMsgSession ()
message = mapi.OpenIMsgOnIStg (mapi_session, None, storage, None, 0,
mapi.MAPI_UNICODE)
print get_body_from_stream (message)

if __name__ == '__main__':
main (*sys.argv[1:])

</code>

TJG

John Henry

unread,
Oct 11, 2010, 11:39:24 AM10/11/10
to

Thank you for your reply.

I am trying your code but when it get to the line:

> mapi.MAPIInitialize ((mapi.MAPI_INIT_VERSION, 0))

I got the error message:

Either there is no default mail client or the current mail client
cannot fulfill the messsage requrest. Please run Microsoft
Outlook ... client.

I have Outlook (not Express - Outlook 2002) running and I did set it
to be the default mail client. Does MAPI works with Exchange only?
(And why do I need MAPI to read the file?)

Regards,

Tim Golden

unread,
Oct 11, 2010, 11:54:53 AM10/11/10
to pytho...@python.org
On 11/10/2010 4:39 PM, John Henry wrote:

> I am trying your code but when it get to the line:
>
>> mapi.MAPIInitialize ((mapi.MAPI_INIT_VERSION, 0))
>
> I got the error message:
>
> Either there is no default mail client or the current mail client
> cannot fulfill the messsage requrest. Please run Microsoft
> Outlook ... client.
>
> I have Outlook (not Express - Outlook 2002) running and I did set it
> to be the default mail client. Does MAPI works with Exchange only?

No. I was running it with Outlook 2003 installed (not running,
in fact, although it is the default mail client on that machine).

> (And why do I need MAPI to read the file?)

Basically because the MAPI subsystem already contains all
the code to interpret that particular format of structured
storage. If you can find some source of info which tells
you how to parse the format directly, then you can sidestep
MAPI. Presumably this is what is done by the Java code you
mentioned.

I'm afraid I'm not at work at the moment, and I don't run
Outlook on this machine. (So I can't even save an .msg file
to test). FWIW the code did run successfully on my work
machine and produced the plain text of the email, so it
is just a configuration sort of issue. If no-one chips
in with a suggestion in a few hours, might be worth
posting to python-win32; there might be people there who
don't watch this (higher-traffic) list.

I have a vague memory that when I set this kind of thing
up to run on our Helpdesk server where I use this to
ingest incoming emails I did have to install a sort
of server-only alternative to Outlook. I'll try to remote
into the server later to see if I can spot it. But that
still wouldn't explain what the problem was if you were
actually running Outlook in any case.

TJG

John Henry

unread,
Oct 12, 2010, 11:59:18 AM10/12/10
to

According to:

http://support.microsoft.com/kb/813745

I need to reset my Outlook registry keys. Unfortunately, I don't have
my Office Install CD with me. This would have to wait.

Thanks,

John Henry

unread,
Oct 12, 2010, 11:59:24 AM10/12/10
to

According to:

Tim Golden

unread,
Oct 12, 2010, 1:31:10 PM10/12/10
to pytho...@python.org
On 12/10/2010 4:59 PM, John Henry wrote:
> According to:
>
> http://support.microsoft.com/kb/813745
>
> I need to reset my Outlook registry keys. Unfortunately, I don't have
> my Office Install CD with me. This would have to wait.

Thanks for the information; I'm keen to see if you're able
to use the solution I posted once this fix is in place.

TJG

John Henry

unread,
Oct 17, 2010, 1:39:42 AM10/17/10
to

Okay, after fixing the Outlook reg entries as described above, I am
able to go further. Now, the code stops at:

message = mapi.OpenIMsgOnIStg (mapi_session, None, storage, None, 0,
mapi.MAPI_UNICODE)

with an error message:

pywintypes.com_error: (-2147221242, 'OLE error 0x80040106', None,
None)

Tim Golden

unread,
Oct 17, 2010, 7:45:19 AM10/17/10
to pytho...@python.org

Strange. That's UNKNOWN_FLAGS. Try the call without the MAPI_UNICODE,
ie make the last param zero. Maybe there's something with Outlook 2002...
I've never tried it myself.

TJG

John Henry

unread,
Oct 17, 2010, 2:37:38 PM10/17/10
to

Okay, omitting the MAPI_UNICODE works!

Now, I have to search and see how I get the header info, and extract
the attachment.

John Henry

unread,
Oct 17, 2010, 3:25:31 PM10/17/10
to

Not knowing anything about MAPI, I tried a number of the MAPI flags,
the only one that works appears to be PR_SUBJECT.
PR_CLIENT_SUBMIT_TIME, PR_CREATION_TIME and so forth doesn't work.

Tim Golden

unread,
Oct 18, 2010, 7:09:21 AM10/18/10
to pytho...@python.org
On 17/10/2010 20:25, John Henry wrote:
> Not knowing anything about MAPI, I tried a number of the MAPI flags,
> the only one that works appears to be PR_SUBJECT.
> PR_CLIENT_SUBMIT_TIME, PR_CREATION_TIME and so forth doesn't work.

I'll try to fish out some of the code we use, but for most
of the fields, having got the body, I simply used the email
module to parse it. (Obviously that doesn't give you anything
which isn't included in the MIME version of the email).

I have a lightweight wrapper that does some of the MAPI
spadework. If you're interested, let me know and I can
send it across or post it somewhere.

TJG

John Henry

unread,
Oct 18, 2010, 1:25:22 PM10/18/10
to

In case you didn't receive my message sent via "reply to author",
please sent wrapper to e c s 1 7 4 9 (at) gmail (dot) com.

Thanks,

John Henry

unread,
Oct 19, 2010, 5:46:22 PM10/19/10
to
On Oct 17, 4:45 am, Tim Golden <t...@westpark-club.org.uk> wrote:

Looks like this flag is valid only if you are getting messages
directly from Outlook. When reading the msg file, the flag is
invalid.

Same issue when accessing attachments. In addition, the MAPITable
method does not seem to work at all when trying to get attachments out
of the msg file (works when dealing with message in an Outlook
mailbox). Eitherway, the display_name doesn't work when trying to
display the filename of the attachment.

I was able to get the date by using the PR_TRANSPORT_MESSAGE_HEADERS
mapitags

John Henry

unread,
Oct 19, 2010, 5:48:31 PM10/19/10
to
On Oct 17, 4:45 am, Tim Golden <t...@westpark-club.org.uk> wrote:

Looks like this flag is valid only if you are getting messages

John Henry

unread,
Oct 19, 2010, 5:51:59 PM10/19/10
to

This flag means the mapi.MAPI_UNICODE flag.

Tim Golden

unread,
Oct 20, 2010, 4:41:09 AM10/20/10
to pytho...@python.org
On 19/10/2010 22:48, John Henry wrote:
> Looks like this flag is valid only if you are getting messages
> directly from Outlook. When reading the msg file, the flag is
> invalid.
>
> Same issue when accessing attachments. In addition, the MAPITable
> method does not seem to work at all when trying to get attachments out
> of the msg file (works when dealing with message in an Outlook
> mailbox). Eitherway, the display_name doesn't work when trying to
> display the filename of the attachment.
>
> I was able to get the date by using the PR_TRANSPORT_MESSAGE_HEADERS
> mapitags

Ah, thanks. As you will have realised, my code is basically geared
to reading an Outlook/Exchange message box. I hadn't really tried
it on individual message files, except my original excerpt. If it
were opportune, I'd be interested in seeing your working code.

TJG

John Henry

unread,
Oct 20, 2010, 12:01:35 PM10/20/10
to

When (and if) I finally figure out how to get it done, I surely will
make the code available. It's pretty close. All I need is to figure
out how to extract the attachments.

Too bad I don't know (and don't have) C#. This guy did it so cleanly:

http://www.codeproject.com/KB/office/reading_an_outlook_msg.aspx?msg=3639675#xx3639675xx

May be somebody that knows both C# and Python can convert the code
(not much code) and then the Python community will have it. As it
stands, it seems the solution is available in Java, C#, VB .... but
not Python.

John Henry

unread,
Oct 20, 2010, 1:13:16 PM10/20/10
to
On Oct 20, 9:01 am, John Henry <john106he...@hotmail.com> wrote:
> On Oct 20, 1:41 am, Tim Golden <m...@timgolden.me.uk> wrote:
>
>
>
> > On 19/10/2010 22:48, John Henry wrote:
>
> > > Looks like this flag is valid only if you are getting messages
> > > directly from Outlook.  When reading the msg file, the flag is
> > > invalid.
>
> > > Same issue when accessing attachments.  In addition, the MAPITable
> > > method does not seem to work at all when trying to get attachments out
> > > of the msg file (works when dealing with message in an Outlook
> > > mailbox).  Eitherway, the display_name doesn't work when trying to
> > > display the filename of the attachment.
>
> > > I was able to get the date by using the PR_TRANSPORT_MESSAGE_HEADERS
> > > mapitags
>
> > Ah, thanks. As you will have realised, my code is basically geared
> > to reading an Outlook/Exchange message box. I hadn't really tried
> > it on individual message files, except my original excerpt. If it
> > were opportune, I'd be interested in seeing your working code.
>
> > TJG
>
> When (and if) I finally figure out how to get it done, I surely will
> make the code available.  It's pretty close.  All I need is to figure
> out how to extract the attachments.
>
> Too bad I don't know (and don't have) C#.  This guy did it so cleanly:
>
> http://www.codeproject.com/KB/office/reading_an_outlook_msg.aspx?msg=...

>
> May be somebody that knows both C# and Python can convert the code
> (not much code) and then the Python community will have it.  As it
> stands, it seems the solution is available in Java, C#, VB .... but
> not Python.

BTW: For the benefit of future search on this topic, with the code
listed above where:

storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_EXCLUSIVE

I had to change it to:

storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_DENY_NONE |
STGM_TRANSACTED

otherwise I get a sharing violation (see
http://efreedom.com/Question/1-1086814/Opening-OLE-Compound-Documents-Read-StgOpenStorage).

For now, I am using a brute force method (http://mail.python.org/
pipermail/python-win32/2009-February/008825.html) to get the names of
the attachments and if I need to extract the attachments, I pop up the
message in Outlook and let Outlook extract the files. Ugly but fits
my client's need for now. Hopefully there will be a cleaner solution
down the road.

Here's my code for brute forcing attachments out of the msg file (very
ugly):

def get_attachments(self, fileID):
#from win32com.storagecon import *
from win32com import storagecon
import pythoncom

flags = storagecon.STGM_READ | storagecon.STGM_SHARE_DENY_NONE |
storagecon.STGM_TRANSACTED
try:
storage = pythoncom.StgOpenStorage (fileID, None, flags)
except:
return []

flags = storagecon.STGM_READ | storagecon.STGM_SHARE_EXCLUSIVE
attachments=[]
for data in storage.EnumElements ():
print data[0], data[1]
if data[1] == 2 or data[0] == "__substg1.0_007D001F":
stream = storage.OpenStream (data[0], None, flags)
try:
msg = stream.Read (data[2])
except:
pass
else:
msg = repr (msg).replace("\
\x00","").strip("'").replace("%23","#")
if data[0] == "__substg1.0_007D001F":
try:
attachments.append(msg.split("name=\"")[1].split("\"")[0])
except:
pass

return attachments

Jon Clements

unread,
Oct 21, 2010, 4:34:00 AM10/21/10
to
> otherwise I get a sharing violation (seehttp://efreedom.com/Question/1-1086814/Opening-OLE-Compound-Documents...).

Only just noticed this thread, and had something similar. I took the
following approach:-

(I'm thinking this might be relevant as you mentioned checking whether
your client's Outlook could export .EML directly, which indicates (to
me at least) that you have some control over that...)

- Set up an IMAP email server on a machine (in this case linux and
dovecot)
- Got client to set up a new account in Outlook for the new server
- Got client to use the Outlook interface to copy relevant emails (or
the whole lot) to new server
- Used the standard imaplib and related modules to do what was needed

From my POV I didn't have to mess around with proprietary formats or
deal with files. From the client's POV, they were able to, with an
interface familiar to them, add/remove what needed processing. It also
enabled multiple people at the client's site to contribute their
emails that might have been relevant for the task.

The program created a sub-folder under the new server, did the
processing, and injected the results to that folder, the client could
then drag 'n' drop to whatever folder they personally used for filing
their end.

They felt in control, and I didn't have to bugger about with maildir/
mbox/pst/eml, whether it was outlook/thunderbird/evolution etc...

If you're only doing "an email here or email there" and don't want to/
can't go full blown mail server route, then a possible option would be
to mock an imap server (most likely using the twisted framework) that
upon an 'APPEND' processes the 'received' email appropriately... (kind
of a server/procmail route...)


Just a couple of ideas.

Cheers,

Jon.


Tim Golden

unread,
Oct 21, 2010, 4:48:54 AM10/21/10
to pytho...@python.org
On 21/10/2010 09:34, Jon Clements wrote:
> Only just noticed this thread, and had something similar. I took the
> following approach:-
>
> (I'm thinking this might be relevant as you mentioned checking whether
> your client's Outlook could export .EML directly, which indicates (to
> me at least) that you have some control over that...)
>
> - Set up an IMAP email server on a machine (in this case linux and
> dovecot)
> - Got client to set up a new account in Outlook for the new server
> - Got client to use the Outlook interface to copy relevant emails (or
> the whole lot) to new server
> - Used the standard imaplib and related modules to do what was needed

Nice lateral approach. It would also be possible to do this same
kind of thing via the native Microsoft toolset alone if the OP
has access to the appropriate Outlook / Exchange accounts. (Indeed,
Exchange itself can act as an IMAP server which might be another
approach). I confess I was starting from the original "Can I read an
.msg file?" question.

TJG

John Henry

unread,
Oct 22, 2010, 12:55:16 PM10/22/10
to
On Oct 21, 1:48 am, Tim Golden <m...@timgolden.me.uk> wrote:
> On 21/10/2010 09:34, Jon Clements wrote:
>
> > Only just noticed this thread, and had something similar. I took the
> > following approach:-
>
> > (I'm thinking this might be relevant as you mentioned checking whether
> > your client'sOutlookcould export .EML directly, which indicates (to

> > me at least) that you have some control over that...)
>
> > - Set up an IMAP email server on a machine (in this case linux and
> > dovecot)
> > - Got client to set up a new account inOutlookfor the new server
> > - Got client to use theOutlookinterface to copy relevant emails (or

> > the whole lot) to new server
> > - Used the standard imaplib and related modules to do what was needed
>
> Nice lateral approach. It would also be possible to do this same
> kind of thing via the native Microsoft toolset alone if the OP
> has access to the appropriateOutlook/ Exchange accounts. (Indeed,

> Exchange itself can act as an IMAP server which might be another
> approach). I confess I was starting from the original "Can I read an
> .msg file?" question.
>
> TJG

Found some useful information:

http://www.fileformat.info/format/outlookmsg/index.htm

At least it takes some mystery out of the msg file. It explains why
my attempt to read the msg file fails sometimes. It appears some
messages don't have a header info (or at least not in the format as
described). I need to keep trying and see how I can get the header
info.

0 new messages