[Python-Dev] Accessing mailing list archives

180 views
Skip to first unread message

Bob Purvy

unread,
Jul 30, 2018, 4:58:25 PM7/30/18
to pytho...@python.org
hi all,

I've been trying to figure out how to access the archives programmatically. I'm sure this is easy once you know, but googling various things hasn't worked.  What I want to do is graph the number of messages about PEP 572 by time.  (or has someone already done that?)

I installed GNU Mailman, and downloaded the gzip'ed archives for a number of months and unzipped them, and I suspect that there's some way to get them all into a single database, but it hasn't jumped out at me.  If I count the "Message-ID" lines, the "Subject:" lines, and the "\nFrom " lines in one of those text files, I get slightly different numbers for each.

Alternatively, they're maybe already in a database, and I just need API access to do the querying?  Can someone help me out?

Bob

Victor Stinner

unread,
Jul 30, 2018, 6:28:48 PM7/30/18
to Bob Purvy, pytho...@python.org
Hi Bob,

I wrote a basic script to compute the number of emails per PEP. It requires to download gzipped mbox files from the web page of archives per month, then ungzip them:
https://github.com/vstinner/misc/blob/master/python/parse_mailman_mbox_peps.py

Results:
https://mail.python.org/pipermail/python-committers/2018-April/005310.html

Victor

Michael Selik

unread,
Jul 31, 2018, 6:59:39 PM7/31/18
to Victor Stinner, pytho...@python.org, Bob Purvy
Would it be possible to normalize by the number of mailing list members and also by "active" members? The latter would be tricky to define.

> Bob _______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/mike%40selik.org

Victor Stinner

unread,
Jul 31, 2018, 8:32:55 PM7/31/18
to Michael Selik, pytho...@python.org, Bob Purvy
Feel free to modify the script to make your own statistics ;-)

Victor

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Cameron Simpson

unread,
Jul 31, 2018, 8:48:34 PM7/31/18
to pytho...@python.org
On 30Jul2018 13:40, Bob Purvy <bpu...@gmail.com> wrote:
>I've been trying to figure out how to access the archives programmatically.
>I'm sure this is easy once you know, but googling various things hasn't
>worked. What I want to do is graph the number of messages about PEP 572 by
>time. (or has someone already done that?)
>
>I installed GNU Mailman, and downloaded the gzip'ed archives for a number
>of months and unzipped them, and I suspect that there's some way to get
>them all into a single database, but it hasn't jumped out at me. If I
>count the "Message-ID" lines, the "Subject:" lines, and the "\nFrom " lines
>in one of those text files, I get slightly different numbers for each.
>
>Alternatively, they're maybe *already* in a database, and I just need API
>access to do the querying? Can someone help me out?

Like Victor, I download mailing list archives. Between pulling them in and also
subscribing, ideally I get a complete history in my "python" mail folder.
Likewise for other lists.

The mailman archives are UNIX mbox files, compressed, with a bit of header
munging (to make address harvesting harder). You can concatenate them and
uncompress and reverse the munging like this:

cat *.gz | gunzip | fix-mail-dates --mbox | un-at-

where fix-mail-dates is here:

https://bitbucket.org/cameron_simpson/css/src/tip/bin/fix-mail-dates

and un-at- is here:

https://bitbucket.org/cameron_simpson/css/src/tip/bin/un-at-

and the output is a nice UNIX mbox file.

You can load that into most mail readers or parse it with Python's email
modules (in the stdlib). It should be easy enough to scan such a thing and
count header contents etc. Ignore the "From " line content, prefer the "From:"
header. (Separate messages on "From " of course, just don't grab email
addresses from it.)

Cheers,
Cameron Simpson <c...@cskk.id.au>
Reply all
Reply to author
Forward
0 new messages