Reading a unix mbox file?

489 views
Skip to first unread message

Jens-Uwe Mager

unread,
Nov 12, 2012, 11:12:27 AM11/12/12
to golan...@googlegroups.com
I have the need to read a unix style mbox file containing a lot of email messages and process each of those. Unfortunately a lot of these messages have rather arcane MIME content types and quite a few even use quoted-printable as encoding. Is there any library out there that would help parsing such a mail box?


Scott Lawrence

unread,
Nov 12, 2012, 11:18:09 AM11/12/12
to Jens-Uwe Mager, golan...@googlegroups.com
Many moons ago, I wrote a simple mail archive -> html converter that also had
to do this. The relevant github repos are github.com/bytbox/go-mail and
github.com/bytbox/sloc. IIRC, quoted-printable was handled correctly. I don't
handle fancy mime stuff in the headers, though. It might be a good starting
place for you.

--
Scott Lawrence

go version go1.0.3
Linux jagadai 3.6.6-1-ARCH #1 SMP PREEMPT Mon Nov 5 11:57:22 CET 2012 x86_64 GNU/Linux

Jens-Uwe Mager

unread,
Nov 12, 2012, 12:01:51 PM11/12/12
to golan...@googlegroups.com, Jens-Uwe Mager
Looks good, go getting right now. But you wanted to say github.com/bytbox/slark?

Scott Lawrence

unread,
Nov 12, 2012, 12:05:20 PM11/12/12
to Jens-Uwe Mager, golan...@googlegroups.com
On Mon, 12 Nov 2012, Jens-Uwe Mager wrote:

> Looks good, go getting right now. But you wanted to say
> github.com/bytbox/slark?

Oh, foo. Yeah, thanks.

James Hillyerd

unread,
Nov 12, 2012, 7:12:54 PM11/12/12
to golan...@googlegroups.com, Jens-Uwe Mager
On Monday, November 12, 2012 8:18:23 AM UTC-8, Scott Lawrence wrote:
On Mon, 12 Nov 2012, Jens-Uwe Mager wrote:

> I have the need to read a unix style mbox file containing a lot of email
> messages and process each of those. Unfortunately a lot of these messages
> have rather arcane MIME content types and quite a few even use
> quoted-printable as encoding. Is there any library out there that would
> help parsing such a mail box?

Many moons ago, I wrote a simple mail archive -> html converter that also had
to do this. The relevant github repos are github.com/bytbox/go-mail and
github.com/bytbox/sloc. IIRC, quoted-printable was handled correctly. I don't
handle fancy mime stuff in the headers, though. It might be a good starting
place for you.


You can also take a look at my go.enmime library:  https://github.com/jhillyerd/go.enmime

Scott's library does more for parsing of the of email headers (and slark has the logic to read mbox format), but I think go.enmime will handle attachments better if you need that.  It's also not clear (to me) on first reading if go-mail can handle nested MIME multiparts, which you will encounter if the email contains inline images for HTML.  go.enmime is still pretty rough around the edges though. :)

-james
Reply all
Reply to author
Forward
0 new messages