Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parsing Email Headers

3 views
Skip to first unread message

T

unread,
Mar 11, 2010, 2:29:53 PM3/11/10
to
All I'm looking to do is to download messages from a POP account and
retrieve the sender and subject from their headers. Right now I'm 95%
of the way there, except I can't seem to figure out how to *just* get
the headers. Problem is, certain email clients also include headers
in the message body (i.e. if you're replying to a message), and these
are all picked up as additional senders/subjects. So, I want to avoid
processing anything from the message body.

Here's a sample of what I have:

# For each line in message
for j in M.retr(i+1)[1]:
# Create email message object from returned string
emailMessage = email.message_from_string(j)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")

I also tried using the following, but got the same results:
emailMessage =
email.Parser.HeaderParser().parsestr(j, headersonly=True)

Any help would be appreciated!

MRAB

unread,
Mar 11, 2010, 3:13:48 PM3/11/10
to Python List

If you're using poplib then use ".top" instead of ".retr".

Grant Edwards

unread,
Mar 11, 2010, 3:20:56 PM3/11/10
to
On 2010-03-11, T <misceve...@gmail.com> wrote:
> All I'm looking to do is to download messages from a POP account and
> retrieve the sender and subject from their headers. Right now I'm 95%
> of the way there, except I can't seem to figure out how to *just* get
> the headers.

The headers are saparated from the body by a blank line.

> Problem is, certain email clients also include headers in the message
> body (i.e. if you're replying to a message), and these are all picked
> up as additional senders/subjects. So, I want to avoid processing
> anything from the message body.

Then stop when you see a blank line.

Or retreive just the headers.

--
Grant Edwards grant.b.edwards Yow! My life is a patio
at of fun!
gmail.com

T

unread,
Mar 11, 2010, 5:44:09 PM3/11/10
to

I'm still having the same issue, even with .top. Am I missing
something?

for j in M.top(i+1, 0)[1]:
emailMessage = email.message_from_string(j)
#emailMessage =
email.Parser.HeaderParser().parsestr(j, headersonly=True)


# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")

Is there another way I should be using to retrieve only the headers
(not those in the body)?

MRAB

unread,
Mar 11, 2010, 6:06:40 PM3/11/10
to pytho...@python.org

The documentation does say:

"""unfortunately, TOP is poorly specified in the RFCs and is
frequently broken in off-brand servers."""

All I can say is that it works for me with my ISP! :-)

T

unread,
Mar 11, 2010, 6:19:36 PM3/11/10
to
Thanks for your suggestions! Here's what seems to be working - it's
basically the same thing I originally had, but first checks to see if
the line is blank

response, lines, bytes = M.retr(i+1)


# For each line in message

for line in lines:
if not line.strip():
M.dele(i+1)
break

emailMessage = email.message_from_string(line)

Thomas Guettler

unread,
Mar 12, 2010, 7:59:54 AM3/12/10
to

Hi T,

wait, this code looks strange.

You delete the email if it contains an empty line? I use something like this:

message='\n'.join(connection.retr(msg_num)[1])

Your code:
emailMessage = email.message_from_string(line)
create an email object from only *one* line!

You retrieve the whole message (you don't save bandwith), but maybe that's
what you want.


Thomas

--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de

0 new messages