Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parsing an email message

11 views
Skip to first unread message

Bernie Cosell

unread,
Jan 10, 2022, 6:37:33 PM1/10/22
to
I need to parse an email message and pull its various parts apart. Is
there some not-so-difficult way to do it? Corriel looks like it would be
just the thing, unfortunately it won't run on Windows. The Mail:: and
Email:: modules seem very complicated when all I want to do is feed it a
complete message and get at the various pieces [body, attachments, etc] and
the headers [from, date, etc]. Is there a _simple_ package that'll do
that? If not, are there tutorials or the like for Mail:: and/or Email::?
They seem to be much more focused on managing actual mailboxes {Mail::} and
*composing* emails [Email::] and give pretty short shrift [to my struggling
with the man pages] to just *parsing* an email. Thanks!

/Bernie\
--
Bernie Cosell Fantasy Farm Fibers
ber...@fantasyfarm.com Pearisburg, VA
--> Too many people, too few sheep <--

Rainer Weikusat

unread,
Jan 11, 2022, 12:31:51 PM1/11/22
to
Bernie Cosell <ber...@fantasyfarm.com> writes:
> I need to parse an email message and pull its various parts apart. Is
> there some not-so-difficult way to do it? Corriel looks like it would be
> just the thing, unfortunately it won't run on Windows. The Mail:: and
> Email:: modules seem very complicated when all I want to do is feed it a
> complete message and get at the various pieces [body, attachments, etc] and
> the headers [from, date, etc]. Is there a _simple_ package that'll do
> that? If not, are there tutorials or the like for Mail:: and/or Email::?
> They seem to be much more focused on managing actual mailboxes {Mail::} and
> *composing* emails [Email::] and give pretty short shrift [to my struggling
> with the man pages] to just *parsing* an email. Thanks!

There is no simple way to parse an e-mail message: That's literally the
most complicated grammar I ever wrote a parser for.

Henry Law

unread,
Jan 11, 2022, 5:58:38 PM1/11/22
to
On Mon, 10 Jan 2022 18:37:26 -0500, Bernie Cosell wrote:

> Is there a _simple_ package that'll do that? If not, are there
> tutorials or the like for Mail:: and/or Email::?

I use Email::MIME. How "simple" it is depends on your point of view but,
as someone else has already observed, MIME email has a complicated
structure (e.g. separate parts within one message are themselves
Email::MIME structures), and you're not going to get a /simple/ piece of
code that understands that.

However, if you pass the text of a single message to Email::MIME, the
object will then give you a "header_pairs" method, which will give you a
great deal of what you need. And there's a "body" method which will give
you the body, surprisingly.

If you want to send me a mail (address is valid) I can let you have great
wodges of code that does this stuff; maybe reading through it and taking
out the bits you don't need might help you. It's object-oriented so you
might even be able to use the packages.

--
Henry Law n e w s @ l a w s h o u s e . o r g
Manchester, England

Andreas Karrer

unread,
Jan 11, 2022, 7:18:40 PM1/11/22
to
* Bernie Cosell <ber...@fantasyfarm.com>:
> I need to parse an email message and pull its various parts apart. Is
> there some not-so-difficult way to do it? Corriel looks like it would be

There is no really simple way because mail headers and MIME are not
simple. A MIME message may be an arbitrarily complex tree of parts,
parts may be items of a whole lot of media types such as text, html,
images, videos, pdf etc. Then there is the further complexity of
"multipart/alternative", where you will have to decide by some
heuristic which of the alternatives you want to extract or display.

I'd recommend Email::MIME, maybe that qualifies as "not-so-difficult".

"arbitrarily complex tree" is a hint that a recursive approach should
be used.

This skeleton passes the mail message in $message to Email::MIME for
parsing. The "showparts" method then displays a summary of each direct
subpart and calls itself recursively for that subpart. It uses
Email::MIME::ContentType to parse the "Content-Type" headers, which may
be quite complex, too.

use Email::MIME;
use Email::MIME::ContentType;

my $email = Email::MIME->new($message);
sub showparts;
sub showparts {
my $item = shift;
my $indent = shift;
my $i = 1;
for my $part ($item->subparts) {
my $ct = parse_content_type($part->content_type);
my $len = length $part->body;
print "part$indent $i: $ct->{type}/$ct->{subtype}, $len bytes\n";
showparts $part, "$indent $i";
$i++;
}
}
showparts $email, "";

If you are, for example, just interested in all pdf attachments,
might be enough to filter out the parts with a Content-Type of
application/pdf or application/x-pdf.



- Andi

Bernie Cosell

unread,
Jan 19, 2022, 1:40:00 PM1/19/22
to
Bernie Cosell <ber...@fantasyfarm.com> wrote:

} I need to parse an email message and pull its various parts apart. Is
} there some not-so-difficult way to do it?

Wow -- thanks for all the info. I knew MIME messages were messy but I
didn't really realize just *how* messy. I think I'll need to more
fine-tune exactly what I want from the message and then focus on
finding/extracting just that.

Bernie Cosell

unread,
Jan 26, 2022, 12:18:34 PM1/26/22
to
Bernie Cosell <ber...@fantasyfarm.com> wrote:

} I need to parse an email message and pull its various parts apart. Is
} there some not-so-difficult way to do it?

I'm still struggling with this and I can't figure what I'm doing wrong I've
been trying to start simple and ease my way into the morass [and thanks for
all the sample code and advice... alas, I'm still kinda lost]. I tried a
very very simple program:
-------------------------------------------------------
!/usr/bin/perl
use v5.10 ;
use strict;
use warnings ;
use Email::Simple ;
use Email::MIME ;
use Email::MIME::ContentType ;
use Email::Simple::Header ;

foreach my $msg (@ARGV)
{ checkmsg($msg) ; }
exit ;

sub checkmsg
{ my $email = Email::Simple->new($_[0]) ;
my @header_names = $email->header_names ;
say scalar(@header_names) ;
foreach my $header (@header_names)
{ say "$header" ; }
exit ;
}
---------------------------------------------------------

I tried it with a simple message [headers in part]
---------------------
[...]
Content-Type: multipart/alternative;
boundary="Apple-Mail=_AB70B143-E35C-42EB-86E0-84730EB5E4A7"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Date: Sun, 9 Jan 2022 13:27:13 -0500
Subject: Getting involved on state level
Message-Id: <840F9FC5-346F-4D62...@swva.net>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-PMFLAGS: 570966400 0 65537 PT49NPRZ.CNM
[...]

--Apple-Mail=_AB70B143-E35C-42EB-86E0-84730EB5E4A7
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8
--------------------------------------

I don't care about sorting out the MIME section, I just want to see if I
can get the headers parsed.. but when I try it:

D:\Desktop\>showparts Mailbox\multipart
0

What am I doing wrong? THANKS!! /bernie\

Bernie Cosell

unread,
Jan 26, 2022, 12:45:22 PM1/26/22
to
Bernie Cosell <ber...@fantasyfarm.com> wrote:

} Bernie Cosell <ber...@fantasyfarm.com> wrote:
}
} } I need to parse an email message and pull its various parts apart. Is
} } there some not-so-difficult way to do it?
}
} I'm still struggling with this ...

Please ignore. When I looked again I realized the idiot mistake I had
made. DUH. It wants the *text* of the message, not a stupid file-name.
When I did the open()... $msg=<..> it all magically worked. What a dolt I
am... Sorry to bother y'all

/Bernie\\
0 new messages