Gmail duplicates

18 views
Skip to first unread message

Robert Turner

unread,
Nov 23, 2009, 10:03:58 AM11/23/09
to larch
I'm using larch to copy mail from a POP3/IMAP account to a Gmail
account. Larch is configured via cron to run every 5 minutes. I'm also
using Google's Gmail fetcher for this account to keep the source
mailbox empty.

My issue is that I routinely (although not always) get duplicate
messages in the Gmail account. I believe that most of these come from
larch running multiple times between Gmail fetches.

Is there a config option in larch I can use to prevent this?

Any other ideas on ways to setup a configuration to do this better?

Thanks,
//rwt

sidney

unread,
Nov 23, 2009, 2:00:12 PM11/23/09
to la...@googlegroups.com
Robert Turner wrote, On 24/11/09 4:03 AM:
> My issue is that I routinely (although not always) get duplicate
> messages in the Gmail account. I believe that most of these come from
> larch running multiple times between Gmail fetches.

I wonder how that is possible. I've seen Gmail delete what it thinks are
extra copies of a message when two different messages have the same
Message-Id header. larch uses the Message-Id as a key in the
~/.larch/larch.db database which should prevent duplicate messages
withthe same Message-Id, but even if that didn't work what I've seen
Gmail do would take care of it.

Are the Message-Id headers same or different in these duplicate messages?

Robert Turner

unread,
Nov 23, 2009, 2:59:16 PM11/23/09
to la...@googlegroups.com
On Mon, Nov 23, 2009 at 2:00 PM, sidney <sid...@gmail.com> wrote:
> Robert Turner wrote, On 24/11/09 4:03 AM:
> Are the Message-Id headers same or different in these duplicate messages?

I should have looked there first. Interestingly enough, in the
duplicate messages the Message-Id header is missing. Gmail fetcher
will often add a Message-Id header, but that doesn't prevent the
duplication.

I guess the larger question is what causes Microsoft Outlook 2003 to
send a message with no ID header?

//rwt

Ryan Grove

unread,
Nov 23, 2009, 4:09:35 PM11/23/09
to la...@googlegroups.com
On Mon, Nov 23, 2009 at 11:59 AM, Robert Turner <robert...@gmail.com> wrote:
> I should have looked there first. Interestingly enough, in the
> duplicate messages the Message-Id header is missing. Gmail fetcher
> will often add a Message-Id header, but that doesn't prevent the
> duplication.
>
> I guess the larger question is what causes Microsoft Outlook 2003 to
> send a message with no ID header?

Only badly-written clients and occasionally automated tools like web
email gateways generate messages without message-ids. Outlook 2003
definitely generates message-ids, so I doubt that's the source of the
problem.

Sidney is right, though: Gmail only de-dupes messages based on
message-ids. When it sees a message without a message-id, it will
assign one internally, but it will not try to de-dupe that message.
Larch also uses message-ids for de-duping, but if one isn't found then
Larch will use a combination of the message's size and its
internaldate as a unique identifier. Your Gmail dupes are probably
occurring because Larch syncs the id-less messages to Gmail first,
then the Gmail fetcher syncs them again and no de-duping occurs
because they lack message-ids.

In any case, why do you need to use both Larch and the Gmail fetcher
to sync your mail? If you want the messages to be deleted from the
source, I'd recommend using just the Gmail fetcher (at least until I
add delete-from-source functionality to Larch).

- Ryan

Robert Turner

unread,
Nov 24, 2009, 9:30:32 AM11/24/09
to la...@googlegroups.com
On Mon, Nov 23, 2009 at 4:09 PM, Ryan Grove <ry...@wonko.com> wrote:
> Only badly-written clients and occasionally automated tools like web
> email gateways generate messages without message-ids. Outlook 2003
> definitely generates message-ids, so I doubt that's the source of the
> problem.

Apparantely this is an issue with Outlook 2003. According to some web
posts I've read, the only guaranteed way that Outlook 03 adds the
Message-Id is if it sends mail through an Exchange server.

> In any case, why do you need to use both Larch and the Gmail fetcher
> to sync your mail? If you want the messages to be deleted from the
> source, I'd recommend using just the Gmail fetcher (at least until I
> add delete-from-source functionality to Larch).

The issue I'm trying to solve is based upon: (a) my corporate
POP3/IMAP mailbox size. It's only 50Mb; and (b) Gmail fetcher has no
time configuator. Once it gets rolling, it only polls every 40 to 60
minutes.

I'm using the Gmail box as my primary message store. I need it to be
refreshed often AND I need the tiny source mailbox to be cleaned out
regularly.

Is there a good replacement for Gmail fetcher that would clean out the
source mailbox - say on a once weekly basis from a crontab?

//rwt

Ryan Grove

unread,
Nov 24, 2009, 7:05:39 PM11/24/09
to la...@googlegroups.com
On Tue, Nov 24, 2009 at 6:30 AM, Robert Turner <robert...@gmail.com> wrote:
> Apparantely this is an issue with Outlook 2003. According to some web
> posts I've read, the only guaranteed way that Outlook 03 adds the
> Message-Id is if it sends mail through an Exchange server.

Interesting! I didn't know that. How Microsoftian.

> The issue I'm trying to solve is based upon: (a) my corporate
> POP3/IMAP mailbox size. It's only 50Mb; and (b) Gmail fetcher has no
> time configuator. Once it gets rolling, it only polls every 40 to 60
> minutes.
>
> I'm using the Gmail box as my primary message store. I need it to be
> refreshed often AND I need the tiny source mailbox to be cleaned out
> regularly.

Ah, got it. You might want to look into Fetchmail. It's intended
specifically for this use case, and is much better suited for it than
Larch is: http://fetchmail.berlios.de/

- Ryan

Ryan Grove

unread,
Nov 24, 2009, 7:10:46 PM11/24/09
to la...@googlegroups.com
On Tue, Nov 24, 2009 at 4:05 PM, Ryan Grove <ry...@wonko.com> wrote:
>> I'm using the Gmail box as my primary message store. I need it to be
>> refreshed often AND I need the tiny source mailbox to be cleaned out
>> regularly.
>
> Ah, got it. You might want to look into Fetchmail. It's intended
> specifically for this use case, and is much better suited for it than
> Larch is: http://fetchmail.berlios.de/

I should also mention that fdm (http://fdm.sourceforge.net/) is
another good alternative, especially if you want to deliver the mail
directly to Gmail via IMAP rather than via SMTP.

- Ryan

Robert Turner

unread,
Nov 25, 2009, 1:48:34 AM11/25/09
to la...@googlegroups.com
On Tue, Nov 24, 2009 at 7:10 PM, Ryan Grove <ry...@wonko.com> wrote:
> I should also mention that fdm (http://fdm.sourceforge.net/) is
> another good alternative, especially if you want to deliver the mail
> directly to Gmail via IMAP rather than via SMTP.

Thanks Ryan - I just got the fetchmail working. Wow - what a program
... what fetchmail needs is a cheatsheet. Anyway, it seems to be
working.

The one downside to using fetchmail for a production application seems
to be the reliance on an smtp server. As long as that is working, I
guess I'm okay.

Thanks for the tip about fdm. I may explore that at some point as well.

//rwt
Reply all
Reply to author
Forward
0 new messages