[dev] Suckless ML archiver?

3 views
Skip to first unread message

Scott Lawrence

unread,
Mar 3, 2012, 3:21:42 PM3/3/12
to Suckless-Dev
I notice that project_ideas lists having a decent mailing list web archiver
system as a goal - I've been parsing RFC5322 messages anyway, so here's a
quick hack of an archiver[1]. 300 lines of go (not counting the go-mail
library, which adds another 300). Takes an mbox files and spits out a
directory full of html-ified messages and an index file, with threading shown
in a manner similar to (hy|pi)permail et al. Sorry I don't have any demo
online - I don't have any interesting mbox files to run it on. No multipart
support ATM, although it's easy to add, since that's in the go stdlib.

There are plenty of things that still need to be done to make this decent; if
there's interest, I'd be happy to take suggestions and get it fully working.
This is a just a "hey look at me!".

[1] https://github.com/bytbox/slark

p.s. thanks for dwm et al!

--
Scott Lawrence

Anselm R Garbe

unread,
Mar 17, 2012, 1:27:00 PM3/17/12
to dev mail list

The mlmmj output format is a directory consisting of files (1-n) where
each contains a single message in mbox format. The number (1-n) is
incremented for each message. For instance the d...@suckless.org
mailing list directory contains 11359 message files as of now. You
could extend your archiver to work on such a directory structure. Once
done, I would give it a go on the d...@suckless.org messages.

Cheers,
Anselm

Scott Lawrence

unread,
Mar 17, 2012, 3:56:47 PM3/17/12
to dev mail list
Hi Anselm,

On Sat, 17 Mar 2012, Anselm R Garbe wrote:

> The mlmmj output format is a directory consisting of files (1-n) where
> each contains a single message in mbox format. The number (1-n) is
> incremented for each message. For instance the d...@suckless.org
> mailing list directory contains 11359 message files as of now. You
> could extend your archiver to work on such a directory structure. Once
> done, I would give it a go on the d...@suckless.org messages.

A single message in mbox format? Or a single message in RFC5322 format (as
typically found in mboxes)? Or single message in not-quite-standard format
(such as used by pipermail behind the scenes)?

If the former, a call to `cat` would suffice to "extend" my archiver.

--
Scott Lawrence

Linux jagadai 3.2.9-1-ARCH #1 SMP PREEMPT Thu Mar 1 09:31:13 CET 2012 x86_64 Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz GenuineIntel GNU/Linux

Anselm R Garbe

unread,
Mar 17, 2012, 4:01:29 PM3/17/12
to dev mail list
On 17 March 2012 20:56, Scott Lawrence <byt...@gmail.com> wrote:
> On Sat, 17 Mar 2012, Anselm R Garbe wrote:
>> The mlmmj output format is a directory consisting of files (1-n) where
>> each contains a single message in mbox format. The number (1-n) is
>> incremented for each message. For instance the d...@suckless.org
>> mailing list directory contains 11359 message files as of now. You
>> could extend your archiver to work on such a directory structure. Once
>> done, I would give it a go on the d...@suckless.org messages.
>
>
> A single message in mbox format? Or a single message in RFC5322 format (as
> typically found in mboxes)? Or single message in not-quite-standard format
> (such as used by pipermail behind the scenes)?

Sorry for the confusion, it is rfc5322 format.

> If the former, a call to `cat` would suffice to "extend" my archiver.

Ok, will give it a try.

Cheers,
Anselm

Scott Lawrence

unread,
Mar 17, 2012, 4:06:42 PM3/17/12
to dev mail list

Oh, if it's just rfc5322, then a simple 'cat' won't do (slark expects an
actual mbox ATM). I'll patch it to handle a sensible directory layout in the
next few days. (Sorry about being so slow to make improvements - I'm somewhat
overloaded for a few weeks.)

Other improvements needed (in case anybody wants to learn go by patching the
go-mail library): handle multipart and the common message encodings, handle
HTML messages elegantly (sanitize but leave basic styling when available?),
and handle UTF headers. (Actually, #2 might be best done in slark, not
go-mail.)

markus schnalke

unread,
Mar 18, 2012, 4:46:56 AM3/18/12
to dev mail list
[2012-03-17 16:06] Scott Lawrence <byt...@gmail.com>

> On Sat, 17 Mar 2012, Anselm R Garbe wrote:
> > On 17 March 2012 20:56, Scott Lawrence <byt...@gmail.com> wrote:
> >> On Sat, 17 Mar 2012, Anselm R Garbe wrote:
> >>>
> >>> The mlmmj output format is a directory consisting of files (1-n) where
> >>> each contains a single message in mbox format. The number (1-n) is
> >>> incremented for each message. For instance the d...@suckless.org
> >>> mailing list directory contains 11359 message files as of now. You
> >>> could extend your archiver to work on such a directory structure. Once
> >>> done, I would give it a go on the d...@suckless.org messages.
> >>
> >> A single message in mbox format? Or a single message in RFC5322 format (as
> >> typically found in mboxes)? Or single message in not-quite-standard format
> >> (such as used by pipermail behind the scenes)?
> >
> > Sorry for the confusion, it is rfc5322 format.

That means, if you add a `.mh_sequences' file, then you have an MH
mail folder -- great.

> Oh, if it's just rfc5322, then a simple 'cat' won't do (slark expects an
> actual mbox ATM).

If you have nmh installed, then you can use packf(1) to generate an
mbox, even to stdout (packf -file /dev/stdout | ...)

AFAIK the differences between an mbox containing one message and a
plain RFC822 message (MH mail store format) are the `From ' line (line
number 1) and that each subsequenc line starting with ``From '' will
be prefixed with `>'. Awk will convert the format easily for you.


meillo

Reply all
Reply to author
Forward
0 new messages