I desire a way to access (read: "search") ~20K emails in one command,
under VM. 15K of those emails are in twenty mbox-format files, and
7K of them are individual files in Maildir format.
Q1: VM is slow on large mboxes. What is the proper way to archive mail?
I want to be able to search all messages at all times.
Q2: In a misguided attempt to try keeping all mail accessible at once,
I abandoned VM for mutt/Maildir. Mutt's keyboard interface sucks,
and I desperately want to come back to VM. But not if VM is slow.
(See Q1.)
- Fred
> I desire a way to access (read: "search") ~20K emails in one command,
> under VM. 15K of those emails are in twenty mbox-format files, and
> 7K of them are individual files in Maildir format.
>
> Q1: VM is slow on large mboxes. What is the proper way to archive mail?
> I want to be able to search all messages at all times.
I don't find it so. I keep around 6k messages (40MB) in my inbox, and
searching for messages containing text is almost instant on modern
machines.
Start-up does take a few seconds, long enough to be annoying, but I
only ever have one VM running, and connect to it remotely, so I only
start up once every few weeks.
I also routinely load the last 10 years' worth of archived mail (about
100MB, 22k messages, in 100 mboxes (all actually in mmdf
format)). Again, start-up is slow (a minute or two on my ancient
laptop), but searching is fast.
So I'd say if you really want to be able to search all your mail all
the time, keep one VM going in background, and connect to it with
gnu-client.
Okay, I you're totally right, and I misremembered. It takes forever to
/save/, which is a designed tradeoff. (KJ, July 25, 1999: "VM, being a mail
reader, does much more reading and searching than writing, so I think
optimizing for reads is a better choice [than using Maildir].")
The problem is that in I auto-save my files every minute or two, and
hit 'gS' to get new mail, which I also do often. This is my way of
never(!) losing data in a crash.
So now I have a new question:
How do I use VM in a way so that it is also fast on saves--I rarely
manipulate mail more than one day old--and allow searching all messages
all the time?
> How do I use VM in a way so that it is also fast on saves--I rarely
> manipulate mail more than one day old--and allow searching all messages
> all the time?
Use virtual folders.
I have the last month or so's mail in a real folder INBOX-new,
and the last year or so's mail in INBOX-lessnew, and then the folder that I
actually use is INBOX, which is a virtual folder combining the two.
Of course, once a month or so, I have to move mail from INBOX-new to
INBOX-lessnew .
Oh, I had no idea you could do that!
vm-virtual-folder-alist. nice.
> Of course, once a month or so, I have to move mail from INBOX-new to
> INBOX-lessnew .
Is that just concatenation? Why don't you keep them in separate monthly
INBOXes and provide the directory to VM? Personal preference?
All my mails are archived in gmail, which is the best option for me at
the time when I need to do search. I also keep around most recent 2k mails
(~80MB) in my VM inbox (autosaving is a pain).
I use gmail imap to download new message (with fetchmail). Messages
from a search result can be marked unread and they will be
re-downloaded by fetchmail. I also have a firefox extension that will
save the message (from gmail web GUI) directly to VM spool file
directly without going the imap route.
Assuming you also want to try. For those older messages in VM, they
can be re-delivered to gmail via a script similar to the one below:
#!/bin/sh
dest=you...@gmail.com
for mbox in $*; do
if [ ! -f $mbox ]; then
echo "$mbox cannot be found."
continue
fi
echo -n "forwinding vm mailbox $mbox to $dest..."
# split the mailbox (-s), add a header (-A for gmail filtering) and remove
# Received: header (-I)
formail -A"X-Archive: VM2Gmail 2008" -I Received: -s ssmtp $dest < "$mbox"
echo "Done."
done
>> Of course, once a month or so, I have to move mail from INBOX-new to
>> INBOX-lessnew .
> Is that just concatenation? Why don't you keep them in separate monthly
> INBOXes and provide the directory to VM? Personal preference?
It's part of my housekeeping. I do the move manually from VM, 'cos it
makes me go through the mail and delete the stuff I don't need to
keep. (I get a lot of attachment-heavy administrative mail which has a
useful lifetime of a few weeks to a few months. And because I'm in
academia, there's a lot of mail which is relevant for the duration of
an academic year.)
In fact, I do two manual winnowings: quick and dirty, from INBOX-new
to INBOX-lessnew, and slow and more vicious from INBOX-lessnew to
monthly folders.
Sounds inefficient, but it works for me;-)
Oh, the other thing I do for speed is to disable autosave while the
main frame is mapped.
> Oh, the other thing I do for speed is to disable autosave while the
> main frame is mapped.
Thanks for an interesting discussion, Julian. But I am not sure what
you mean by "while the main frame is mapped". You mean you turn it off
in the INBOX-lessnew folder? It would be useful to know.
Autosave is what gets me too.
And, calculating summary lines, which is so painfully slow! Rmail used
to store cached summary lines in the folder so that it could display
summaries quickly. (It probably still does.) I think VM needs to do
something similar.
Cheers,
Uday
>> Oh, the other thing I do for speed is to disable autosave while the
>> main frame is mapped.
>
> Thanks for an interesting discussion, Julian. But I am not sure what
> you mean by "while the main frame is mapped". You mean you turn it
> off in the INBOX-lessnew folder? It would be useful to know.
Um. Yes, that statement doesn't make much sense without knowing my
context! Sorry.
The way I work is to use a dedicated XEmacs for VM. This runs on my
office workstation on the desktop. If I'm anywhere else, I ssh to the
workstation and connect via gnuclient to that VM.
I have an auto-save-hook set that suppresses auto-save if the initial
frame of the XEmacs (i.e. the one I actually use on the desktop) is
mapped. This stops auto-saving while I'm using it at work, but not if
I'm using it from elsewhere.
However, in fact, this is a non-problem for me now, as I've changed my
usage pattern so that I almost always do an explicit save every time I
do something to change the folder. (Why? Because auto-save doesn't
save attributes, and it's a pain going through the auto-saved file
bringing the attributes into accordance with reality. Also, I unison
frequently, so I want the file to match the buffer.)
> However, in fact, this is a non-problem for me now, as I've changed my
> usage pattern so that I almost always do an explicit save every time I
> do something to change the folder. (Why? Because auto-save doesn't
> save attributes, and it's a pain going through the auto-saved file
> bringing the attributes into accordance with reality. Also, I unison
> frequently, so I want the file to match the buffer.)
Thanks for the explanation. A couple of points:
- Explicit save is likely to take just as much time as auto-save, though
I suppose you can do it selectively and be a bit smarter about it.
Still, I have to admire your discipline in saving the folder every time
you change it. (I used to have a similar habit, but lost it now because
it takes too long to do saves.)
- auto-save does save attributes, because VM flushes attribute changes
to the folder. See the variable 'vm-flush-interval'. Its default value
is 90 seconds. You might have changed it somewhere.
Still, this has been a good discussion. Splitting email into multiple
folders and selectively turning off auto-save for them is a good way to
go.
Cheers,
Uday
> - auto-save does save attributes, because VM flushes attribute changes
> to the folder. See the variable 'vm-flush-interval'. Its default
> value is 90 seconds. You might have changed it somewhere.
True. Looking in my .vm, I see
; there is a bug which in newer vms under 21.4 causes freezes
(setq vm-flush-interval nil)
Unfortunately, this dates from before I bothered version-controlling
my .vm, so I don't know what "newer" means. However, I suppose that
this no longer applies, so I'll try turning it back on.
It does, just look at the X-VM-v5-Data header, so it is slowed
down somewhere else ...
Robert.
I did a profiling experiment using elp.
Test 1:
Function Name Call Count Elapsed Time
Average Time
========================================== ========== ============
============
vm-do-summary 1 12.858
12.858
vm-do-needed-summary-rebuild 1 12.858
12.858
vm-tokenized-summary-insert 1516 9.3029999999
0.0061365435
vm-decode-mime-encoded-words-in-string 3033 6.7909999999
0.0022390372
vm-decode-mime-encoded-words 13 6.65
0.5115384615
vm-mime-base64-decode-region 10 3.9659999999
0.3965999999
vm-mime-Q-decode-region 4 2.654
0.6635
vm-summarize 1 0.2209999999
0.2209999999
vm-su-summary 3032 0.0510000000
1.68...e-005
vm-su-mark 1516 0.02
1.31...e-005
vm-set-summary-pointer 1 0.0
0.0
vm-summary-highlight-region 1 0.0
0.0
vm-summary-xxxx-highlight-region 1 0.0
0.0
vm-do-needed-folders-summary-update 3 0.0
0.0
So, some 60% of the time of vm-tokenized-summary-insert is being spent
in decode-mime. There were only 14 instances of decoding needed out of
1516 header lines, which took up all this time.
The first solution to try would be to cache mime-decoded strings in the
tokenized summary entry of the message.
Cheers,
Uday
> <newsspam5...@robf.de> wrote in message news:85wsp22...@robf.de...
>>> And, calculating summary lines, which is so painfully slow! Rmail used to
>>> store cached summary lines in the folder so that it could display summaries
>>> quickly. (It probably still does.) I think VM needs to do something
>>> similar.
>>
>> It does, just look at the X-VM-v5-Data header, so it is slowed
>> down somewhere else ...
>
> I did a profiling experiment using elp.
Hey, I never used elp before, thanks for enlightening me ;-)
Have you been doing a elp-instrument-package or something else?
[...]
> So, some 60% of the time of vm-tokenized-summary-insert is being spent in
> decode-mime. There were only 14 instances of decoding needed out of 1516
> header lines, which took up all this time.
>
> The first solution to try would be to cache mime-decoded strings in the
> tokenized summary entry of the message.
Hmm, decoding and other things should happen only once for a
message or when recreating the summary entry. Do those 14
instance correlate to a total of 14 attribute changes, marks or
new messages?
In deed caching might give a boost ...
Robert.
Well, actually, I have to admit it wasn't all that straightforward. I used
elp-instrument-list on a list of all the functions in vm-summary.el. I also
set debug-on-quit to t and typed ^G at random times to find out where VM was
spending time. That is how I discovered that mime decoding was a key. Then
I added all the functions in vm-mime.el to the instrumented functions.
> Hmm, decoding and other things should happen only once for a
> message or when recreating the summary entry. Do those 14
> instance correlate to a total of 14 attribute changes, marks or
> new messages?
No, they were just for decoding mime-encoded "From" or "Subject" headers. I
also thought that calling external mime decoders for these things is kind of
silly. When I re-ran the experiment after setting
vm-mime-qp-decoder-program etc to nil, the time for mime decoding became
negligible. So, this is easy to fix.
But this only cut the time by half. The remaining half is still
significant. Well, I will have to look at it again another day...
Cheers,
Uday