Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

De-duplicating a Maildir directory

79 views
Skip to first unread message

d...@brannerchinese.com

unread,
Dec 17, 2021, 4:29:44 AM12/17/21
to
Does Alpine contain functionality for de-duplicating a Maildir directory?

It sometimes happens that a single message gets saved more than once to an archiving directory, and I'd like to know if there is already functionality for removing such duplicates.

Thanks!

- dpb

d...@brannerchinese.com

unread,
Dec 17, 2021, 4:37:15 AM12/17/21
to
I'm aware of this free-standing application: https://github.com/kdeldycke/mail-deduplicate

But I'm wondering if there is anything comparable built into Alpine itself.

- dpb

J.O. Aho

unread,
Dec 17, 2021, 7:58:09 AM12/17/21
to
On 17/12/2021 10.37, d...@brannerchinese.com wrote:
> I'm aware of this free-standing application: https://github.com/kdeldycke/mail-deduplicate
>
> But I'm wondering if there is anything comparable built into Alpine itself.

I think de-duplication is a file system feature, zfs has a such
functionality where it will just store one block with the same data and
then just point to that block. When you delete the last file pointing to
that block, then the block content is deleted too.

No, I Alpine don't have a function for deleting duplicate mails, you
should look at tools made for this, for example
https://github.com/kdeldycke/mail-deduplicate

--
//Aho

d...@brannerchinese.com

unread,
Dec 18, 2021, 6:40:54 AM12/18/21
to
I find mail-deduplicate inadequately documented, and some of the functionality doesn't work as expected. Output, for instance, seems always to be to mbox format, even when I specify Maildir input.

However, I find fdupes (available through many package managers) helpful.

- dpb

Eduardo Chappa

unread,
Dec 18, 2021, 12:41:58 PM12/18/21
to
Dear dpb,

if you build alpine with maildir support, then the mailutil program
bundled with Alpine will be able to read a maildir folder and remove
duplicates. What you would do is to use the mailutil program as

mailutil dedup MAILBOX_NAME

if you do not input the MAILBOX, mailutil will remove duplicates of your
INBOX. For purposes of defining a duplicate, this is understood as two
messages that have the same message-id.

I hope this helps.

--
Eduardo
https://tinyurl.com/yc377wlh (web)
http://repo.or.cz/alpine.git (Git)

Carlos E.R.

unread,
Dec 21, 2021, 7:12:08 AM12/21/21
to
Thunderbird has an addon to do this. It searches a folder, and produces
a window listing duplicates (it displays several fields), offering to
delete them. I find it a useful function.

--
Cheers, Carlos.

Henning Hucke

unread,
Dec 23, 2021, 3:37:42 AM12/23/21
to
On 2021-12-21, Carlos E.R. <robin_...@es.invalid> wrote:

> [...]
>
> Thunderbird has an addon to do this. It searches a folder, and produces
> a window listing duplicates (it displays several fields), offering to
> delete them. I find it a useful function.

Strange thing whis is! I never had (real) duplicates except intentional ones.
The last part of the centence means that indeed it happenes that I save
one mail to another folder without deleting the "original".
Aside from this duplicates show up from sources which obvioulsy don't
understand the task of a message ID and the necessity to avoid duplicates or
which don't know how to generate unique identifiers.

Atlassian and Jira are an bad example of that...

Nonetheless they are no real duplicates in the sense that they are
identical in message ID as well as mail body.

Best regards,
Henning
--
In the first place, God made idiots;
this was for practice; then he made school boards.
-- Mark Twain

Carlos E.R.

unread,
Dec 23, 2021, 7:36:06 AM12/23/21
to
On 23/12/2021 09.07, Henning Hucke wrote:
> On 2021-12-21, Carlos E.R. <robin_...@es.invalid> wrote:
>
>> [...]
>>
>> Thunderbird has an addon to do this. It searches a folder, and
>> produces a window listing duplicates (it displays several fields),
>> offering to delete them. I find it a useful function.
>
> Strange thing whis is! I never had (real) duplicates except intentional
> ones.
> The last part of the centence means that indeed it happenes that I save
> one mail to another folder without deleting the "original".
> Aside from this duplicates show up from sources which obvioulsy don't
> understand the task of a message ID and the necessity to avoid
> duplicates or
> which don't know how to generate unique identifiers.
>
> Atlassian and Jira are an bad example of that...
>
> Nonetheless they are no real duplicates in the sense that they are
> identical in message ID as well as mail body.

They happen easily when having two or more computers with local folders,
when trying to keep things in sync between them.

Say, on computer A you save mails about SciFi to folder SciFi, and later
you do the same on computer B, but at that time there is a different
selection for whatever reason, and later you try to sync the two SciFi
folders.

Or you move some mails to a temporary folder, then a year later you find
that temporary and forgotten folder, and being afraid of deleting mails
you move them to a final folder, not remembering they are already there.

Things like that.

True duplicates.

So, a go at finding duplicates finds them and you can remove them
relatively easily.


Judging a dupe just by the messageid is a mistake. For instance, the
sent folder and the inbox from a mail list would have your email in both
places with the same messageid, but if you look carefully you see
different headers, and sometimes different bodies.

Gmail does exactly this mistake.

--
Cheers, Carlos.
0 new messages