Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Getting data out of Mozilla Thunderbird with Python?

2,445 views
Skip to first unread message

Anthony Papillion

unread,
Dec 8, 2015, 1:22:31 PM12/8/15
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hello Everyone,

I have a TON of email (years) stored in my Thunderbird. My backup
strategy for the last few years has been to periodically dump it all
in a tar file, encrypt that tar file, and move it up to the cloud.
That way, if my machine ever crashes, I don't lose years of email.

But I've been thinking about bringing Python into the mix to build a
bridge between Thunderbird and SQLite or MySQL (probably sqlite) where
all mail would be backed up to a database where I could run analytics
against it and search it more effectively.

I'm looking for a way to get at the mail stored in Thunderbird using
Python and, so far, I can't find anything. I did find the mozmail
package but it seems to be geared more towards testing and not really
the kind of use I need.

Can anyone suggest anything?

Many Thanks,
Anthony Papillion

- --
Phone: 1.845.666.1114
Skype: cajuntechie
PGP Key: 0x028ADF7453B04B15
Fingerprint: C5CE E687 DDC2 D12B 9063 56EA 028A DF74 53B0 4B15

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJWZx+3AAoJEAKK33RTsEsVVa8QAKf1AmFdJsi4/b08vpkfwP3c
akGV98EuZzEva29jr8nnfXGgqw7xD/nDjMyLzuO0/q4Kn7eKpEnxkcGDLSbDgxaW
O8kD5eALHCVlUp9p/h7RMBBAyZ4mH8YC6qwvd5SWtH0TIMR7ClcWmDYwPF1Ahk7n
NAFvTsMl8PSnhcIoWHE4vebN4wHR8gZAxOLI8WVPA2BbER64EXiL00nWBav6UDN5
NUosAAVa549rrH0ibEf7Lada63DRTHCYnESxNIkAAHIO0z69WjnfZQ8gmmGFhuaW
AZzqYV5pIhdRnvrwjCQ06LtUNtz/qPqLbLSWF0hA6lwPKqzNum9EdvS4c1xjcXsU
KpOCTmJXy40x1Oi8h+yT6PGiDxt5VCHCdN8ppToI3HY5pYmoiPgWszJzrqYMz7hz
ruhNFAksKNUSI9QQupYcPw6oKQdnoGWmBH1yvGlZqeZuIxhGEv87oqRISE4NRQLe
yL4aDebwXdDgBzIZvFOFy2W4L43jdravg2/LliSC18iCUKBnIpWhazy7NZHw6h55
h3QP84DeuB/9tPLQUZF+BEJm3I+V8WfSKVVnsSbk/n/chHgYpWnu+h/wpD6lx43x
y0lPJm0ni5LeQM1bK4TsIXVEAOzl8UaOwn/VUG7P6Jnt6VEqvQutWZ0/WEeP1nIX
M7+e9hLlQWtlEbl6ud1K
=Dz7N
-----END PGP SIGNATURE-----

Thomas 'PointedEars' Lahn

unread,
Dec 8, 2015, 1:43:16 PM12/8/15
to
Anthony Papillion wrote:

> -----BEGIN PGP SIGNED MESSAGE-----

Please don’t do that again.

> I have a TON of email (years) stored in my Thunderbird. My backup
> strategy for the last few years has been to periodically dump it all
> in a tar file, encrypt that tar file, and move it up to the cloud.
> That way, if my machine ever crashes, I don't lose years of email.
>
> But I've been thinking about bringing Python into the mix to build a
> bridge between Thunderbird and SQLite or MySQL (probably sqlite) where
> all mail would be backed up to a database where I could run analytics
> against it and search it more effectively.
>
> I'm looking for a way to get at the mail stored in Thunderbird using
> Python and, so far, I can't find anything. I did find the mozmail
> package but it seems to be geared more towards testing and not really
> the kind of use I need.
>
> Can anyone suggest anything?

Yes.

(Please never ask that question again:
<http://www.catb.org/~esr/faqs/smart-questions.html>)











Thunderbird uses the mbox format to store both e-mails and news messages.

--
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

Mark Lawrence

unread,
Dec 8, 2015, 3:35:21 PM12/8/15
to
On 08/12/2015 18:42, Thomas 'PointedEars' Lahn wrote:
> Anthony Papillion wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>
> Please don’t do that again.
>

Says who?

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

Cameron Simpson

unread,
Dec 8, 2015, 11:22:11 PM12/8/15
to
On 08Dec2015 12:21, Anthony Papillion <ant...@cajuntechie.org> wrote:
>I have a TON of email (years) stored in my Thunderbird. [...]
>I've been thinking about bringing Python into the mix to build a
>bridge between Thunderbird and SQLite or MySQL (probably sqlite) where
>all mail would be backed up to a database where I could run analytics
>against it and search it more effectively.
>
>I'm looking for a way to get at the mail stored in Thunderbird using
>Python and, so far, I can't find anything. I did find the mozmail
>package but it seems to be geared more towards testing and not really
>the kind of use I need.
>
>Can anyone suggest anything?

The local messges folders in Thunderbird are plain old mbox files IIRC. You can
read these with the email.* modules and analyse them to your heart's content.

So I'd just write a Python program to read a single mbox file and break it into
messages, make each one into an email Message object, and the process as you
see fit.

Then point it at each of the TBird mbox files in turn.

Cheers,
Cameron Simpson <c...@zip.com.au>

Steven D'Aprano

unread,
Dec 9, 2015, 2:43:46 AM12/9/15
to
On Wednesday 09 December 2015 05:42, Thomas 'PointedEars' Lahn wrote:

[snip]

Thomas, your sig says:

Please do not cc me. / Bitte keine Kopien per E-Mail.

but you have a Reply-To set. That implies that you want replies to be sent
directly to you by email, not to the list or newsgroup. Is that really what
you want? That seems incompatible with your signature. Which is correct?



--
Steve

Christian Gollwitzer

unread,
Dec 9, 2015, 3:04:05 AM12/9/15
to
Am 08.12.15 um 19:21 schrieb Anthony Papillion:
> I have a TON of email (years) stored in my Thunderbird. My backup
> strategy for the last few years has been to periodically dump it all
> in a tar file, encrypt that tar file, and move it up to the cloud.
> That way, if my machine ever crashes, I don't lose years of email.
>
> But I've been thinking about bringing Python into the mix to build a
> bridge between Thunderbird and SQLite or MySQL (probably sqlite) where
> all mail would be backed up to a database where I could run analytics
> against it and search it more effectively.
>
> I'm looking for a way to get at the mail stored in Thunderbird using
> Python and, so far, I can't find anything. I did find the mozmail
> package but it seems to be geared more towards testing and not really
> the kind of use I need.

You have several options.

1) As noted before, Thunderbird ususally stores mail in mbox format,
which you can read and parse. However it keeps an extra index file
(.msf) to track deleted messages etc. Until you "compact" the folders,
the messages are not deleted in the mbox file

2) You can configure it to use maildir instead. Maildir is a directory
where every mail is stored in a single file. That might be easier to
parse and much faster to access.

3) Are you sure that you want to solve the problem using Python?
Thunderbird has excellent filters and global full text search (stored in
sqlite, btw). You can instruct it to archive mails, which means it
creates a folder for each year - once created for a past year, that
folder will never change. This is how I do my mail backup, and these
folders are backed up by my regular backup (TimeMachine). You could also
try to open the full text index with sqlite and run some query on it.

4) Yet another option using Thunderbird alone is IMAP. If you can either
use a commercial IMAP server, have your own server in the cloud or even
write an IMAP server using Python, then Thunderbird can
access/manipulate the mail there as a usual folder.

5) There are converters like Hypermail or MHonArc to create HTML
archives of mbox email files for viewing in a browser

Christian

Steven D'Aprano

unread,
Dec 9, 2015, 6:11:44 AM12/9/15
to
On Wed, 9 Dec 2015 07:03 pm, Christian Gollwitzer wrote:

> 1) As noted before, Thunderbird ususally stores mail in mbox format,
> which you can read and parse. However it keeps an extra index file
> (.msf) to track deleted messages etc. Until you "compact" the folders,
> the messages are not deleted in the mbox file
>
> 2) You can configure it to use maildir instead. Maildir is a directory
> where every mail is stored in a single file. That might be easier to
> parse and much faster to access.

Maildir is also *much* safer too. With mbox, a single error when writing
email to the mailbox will likely corrupt *all* emails from that point on,
so potentially every email in the mailbox. With maildir, a single error
when writing will, at worst, corrupt one email.

Thanks Mozilla, for picking the *less* efficient and *more* risky format as
the default. Good choice!


> 3) Are you sure that you want to solve the problem using Python?
> Thunderbird has excellent filters and global full text search (stored in
> sqlite, btw).

Sqlite is unsafe on Linux systems if you are using ntfs. I have had no end
of database corruption with Firefox and Thunderbird due to this, although
in fairness I haven't had any problems for a year or so now.



--
Steven

srinivas devaki

unread,
Dec 9, 2015, 9:07:15 AM12/9/15
to
On Dec 9, 2015 4:45 PM, "Steven D'Aprano" <st...@pearwood.info> wrote:
>
> Maildir is also *much* safer too. With mbox, a single error when writing
> email to the mailbox will likely corrupt *all* emails from that point on,
> so potentially every email in the mailbox. With maildir, a single error
> when writing will, at worst, corrupt one email.
>

may be with frequent backup of mbox file and storing checksum to each email
will be faster and safe too.
I wonder if they already do that.

Chris Angelico

unread,
Dec 9, 2015, 9:15:33 AM12/9/15
to
On Thu, Dec 10, 2015 at 1:06 AM, srinivas devaki
<mr.eight...@gmail.com> wrote:
> On Dec 9, 2015 4:45 PM, "Steven D'Aprano" <st...@pearwood.info> wrote:
>>
>> Maildir is also *much* safer too. With mbox, a single error when writing
>> email to the mailbox will likely corrupt *all* emails from that point on,
>> so potentially every email in the mailbox. With maildir, a single error
>> when writing will, at worst, corrupt one email.
>>
>
> may be with frequent backup of mbox file and storing checksum to each email
> will be faster and safe too.
> I wonder if they already do that.

Yes, because we all know that frequent checking is better than
prevention. That's why MySQL's myisamchk command makes it so much
better than PostgreSQL's transactional DDL.

ChrisA

Grant Edwards

unread,
Dec 9, 2015, 12:25:35 PM12/9/15
to
On 2015-12-09, Steven D'Aprano <st...@pearwood.info> wrote:

> Thanks Mozilla, for picking the *less* efficient and *more* risky format as
> the default. Good choice!

At least they picked a standard format as the default and gave you the
option to use a different standard format (cf. Microsoft and Outlook).

--
Grant Edwards grant.b.edwards Yow! Are you the
at self-frying president?
gmail.com

Michael Torrie

unread,
Dec 10, 2015, 1:24:10 AM12/10/15
to
On 12/09/2015 04:11 AM, Steven D'Aprano wrote:
> Maildir is also *much* safer too. With mbox, a single error when writing
> email to the mailbox will likely corrupt *all* emails from that point on,
> so potentially every email in the mailbox. With maildir, a single error
> when writing will, at worst, corrupt one email.
>
> Thanks Mozilla, for picking the *less* efficient and *more* risky format as
> the default. Good choice!

Not so long ago, many filesystems were very poor at storing lots of
small files. For disk efficiency, storing them in one big file,
periodically compacting the file, was seen as a better way to go. After
all mbox format has been around for a very long time for certain reasons
(which no longer exist today). Maildir came later. Back when hard
drives were smaller, it was also not uncommon to run out of inodes in a
file system on a server that had many small files.

Neither of these issues is much of a problem these days. Ext4 added the
ability to store small files right in the inode, so internal
fragmentation (and wasting of space) isn't a big issue anymore.

It's good to know I can configure Thunderbird to use maildir for local
storage. I'll have to make the change here. Will make my backups a lot
easier and faster.

Thomas 'PointedEars' Lahn

unread,
Dec 10, 2015, 8:26:38 AM12/10/15
to
Michael Torrie wrote:

> It's good to know I can configure Thunderbird to use maildir for local
> storage. I'll have to make the change here. Will make my backups a lot
> easier and faster.

But see also <https://wiki.mozilla.org/Thunderbird/Maildir>. Not all of
those bugs have been resolved/fixed.

--
PointedEars

Twitter: @PointedEars2
0 new messages