[ale] Extracting dovecot email

344 views
Skip to first unread message

Lightner, Jeff

unread,
Nov 16, 2010, 4:14:10 PM11/16/10
to Atlanta Linux Enthusiasts

Does anyone know how to extract mail in a usable way from Dovecot mailboxes given the presence of the “cur” (inbox folder directory apparently) and .<folder> directories.

 

In the cur directory I find individual ASCII files that more or less correspond to the email I see in a web view.   However in the directory names that begin with a dot that correspond to other folers seen in the web view (sans dot) there are files like:

courierimapacl

courierimapuiddb

dovecot.index

dovecot.index.cache

dovecot.index.log

dovecot-uidlist

 

Online I see the dovecot.index is the main index file and that dovecot.index.cache contains the data of the emails.   While I can see some header information in the cache that corresponds to emails seen in web view it is clear there is some formatting here that prevents me from seeing the entire message at command line.

 

We recently bought another company and are attempting to archive the email that is currently hosted by a 3rd party web/mail hosting company  as we will be eliminating that hosting once we move the users to our in house systems.   I was able to use wget to download everything from the Cpanel at the hosting company including the mail in the format above.  

 

Most of what I find online regarding Dovecot email talks about converting things to it rather than away from it or from one of its mailbox types to another.

 

 

 
Proud partner. Susan G. Komen for the Cure.
 
Please consider our environment before printing this e-mail or attachments.
----------------------------------
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.
----------------------------------

Michael Trausch

unread,
Nov 16, 2010, 4:29:24 PM11/16/10
to Atlanta Linux Enthusiasts

That is the Maildir format. Cur is the "seen messages" (not seen by the user, but seen by the user's MUA.  The new directory holds the one the MUA hasn't seen yet. The messages may have multiple hardlinks for example if they have been moved or deleted and not yet expunged.

For the most part, you can just treat the directory tree as a message database. One complete message is stored in one file, and the name of the file contains the IMAP flags applicable to it.  It should be possible to read them in one file at a time and write them back out as an mbox format file. You need to create the mbox "from" line at the start of each message, and add > characters to the word "from" that begins the line of a message if one isn't already there... but I do not believe that they need any additional transformation.

--
Sent from my G2 running CyanogenMod!
That is, a phone. :)

On Nov 16, 2010 4:14 PM, "Lightner, Jeff" <jlig...@water.com> wrote:

Lightner, Jeff

unread,
Nov 17, 2010, 8:44:27 AM11/17/10
to Atlanta Linux Enthusiasts

Thanks,

 

While what you say is true for the files in “cur”, as I noted those correspond ONLY to the ones in the user’s “inbox” folder when viewed on the web.

 

The other directories I was speaking of correspond to the user’s OTHER folders (e.g. Sent and others like one named GKAdvantage that is unique to this user).  I can see the full message in the web view of these folders but the files in these directories are NOT the same as the ones in “cur” so do need some method of being extracted.   As I noted in my original post the files in these directories are dovecot cache files.   Simply viewing them as I do the ones in cur only shows me header information – I can not see the body of the email so my assumption is it is encoded within the cache file as described.   It is these other folders I would like to extract.   Is there a way to extract from the cache to get the individual files like the ones seen in cur? 

 


Michael B. Trausch

unread,
Nov 17, 2010, 10:20:54 AM11/17/10
to Atlanta Linux Enthusiasts
On Wed, 2010-11-17 at 08:44 -0500, Lightner, Jeff wrote:
> The other directories I was speaking of correspond to the user’s OTHER
> folders (e.g. Sent and others like one named GKAdvantage that is
> unique to this user). I can see the full message in the web view of
> these folders but the files in these directories are NOT the same as
> the ones in “cur” so do need some method of being extracted. As I
> noted in my original post the files in these directories are dovecot
> cache files. Simply viewing them as I do the ones in cur only shows
> me header information – I can not see the body of the email so my
> assumption is it is encoded within the cache file as described. It
> is these other folders I would like to extract. Is there a way to
> extract from the cache to get the individual files like the ones seen
> in cur?

Without seeing what you're talking about, I'm a bit lost. My dovecot
maildir setup does not seem to share this property; all files are indeed
messages in all of the user's custom folders.

To clarify a bit, a maildir has metainformation, and always three
directories: cur, new, tmp. You'll nearly never see anything in tmp,
it's just a staging area that is used when mail is initially dropped.
Then the message is moved to new, then when the IMAP client logs in they
are moved again to cur.

At the root of the maildir, if you do this:

$ find . -type d -name cur -o -name new

You'll get a list of all of the message-containing directories in the
maildir tree. THe index, cache, etc., aren't necessary since all mail
systems that I am aware of will generate their own when presented with a
mail store that has yet to be indexed.

If your setup isn't working this way then I'm not able to help any
further; I'd start looking at the configuration or source code patches
on your local install to see if anything has been changed.

--- Mike

signature.asc
Reply all
Reply to author
Forward
0 new messages