I want to continue the discussion about this topic that started with
my request (http://groups.google.de/groups?selm=8utj4k%24cbi%241%40news.messer.de)
on 2000-11-15.
I thank all parties for their comments and suggestions.
The term "SMTP mail" that I've chosen specifies a message according to
the standard "RFC 822" (http://ietf.org/rfc/rfc0822.txt). This format
can be send over the communication protocol "SMTP" and some others.
Excerpt from the book "Programming Internet Email" - Chapter 4
"MIME-compliant messages", Page 60:
"...
the simplest MIME-compliant message is just an RFC 822 simple text
message with an added MIME-Version header.
..."
This request's subject is the deletion of messages in the file "mbox".
1. Some programmes provide the prompt/command line as their
programming interface.
The deletion can be performed manually by typing the letter "d" with a
following message number inside the programmes "mail". (This step can
be performed by a script, too.)
I encounter the difficulty to scroll through all exististing messages
by a command to find the right number for the message that should be
deleted. I am looking for an efficient algorithm.
2. I've spot that the "landscape" of comfortable programming
interfaces has improved.
2.1 PHP
http://pear.php.net/packages.php?catpid=14&catname=Mail
What a pity - The package "mailparse" has got the state "0.9.1
beta" und the package "Mail_Mbox" has got the state "0.1.5 alpha".
2.2 TCL
http://tcllib.sourceforge.net/doc/index.html
mime - Manipulation of MIME body parts
Tcl MIME - generates and parses MIME body parts
2.3 Perl
http://search.cpan.org/modlist/Mail_and_Usenet_News/Mail/
The package "Mail::Box" (Mail folder manager and MUA backend,
http://perl.overmeer.net/mailbox/) has got the state "2.034 beta".
The package "Mail::MboxParser" (read-only access to
UNIX-mailboxes) has got the state "0.38".
The package "Mail::Field" (Base class for manipulation of mail
header fields) has got the state "1.58".
2.4 Python
http://www.python.org/doc/lib/netdata.html
12.2 email -- An email and MIME handling package
12.4 mailbox -- Read various mailbox formats
2.5 Ruby
http://www.ruby-lang.org/raa/cat.rhtml?category_major=Library;category_minor=Mail
The package "mbox" (mbox Read and write UNIX' mbox mail-format)
has got the state "20001026 unstable".
2.6 Java
http://java.sun.com/products/javamail/
The package "JavaMail" has got the state "1.3, 26. Juni 2002".
Is a protocol provider for the format "Mbox" available?
3. A method for the deletion seems to be missing. I come to a
fundamental programming problem now.
How can a piece be cut out from the file "mbox" that can be huge in
size and that is accessed rapidly and perhaps without breaks while new
messages are appended to the file end?
(The programme "mailx" has got a built-in solution. I don't want to
look at the source files.)
3.1 I do not know yet how to put the ends of the gap together again.
3.2 The file can always be written (without the deleted contents) to a
new version. The updated version replaces the old one. Is this
applicable if you think about speed?
A problem detail is also to acquire a lock for the write access to
update the file.
Sincerely,
Markus Elfring
You may find http://www.jwz.org/doc/mailsum.html helpful.
Ben
in comp.programming i read:
>The term "SMTP mail" that I've chosen specifies a message according to
>the standard "RFC 822" (http://ietf.org/rfc/rfc0822.txt). This format
>can be send over the communication protocol "SMTP" and some others.
rfc 2822 is the current standard. almost no system stores that format
unmodified, e.g., the unix mbox format prefixes it with a `from ' line and
prefixes lines that start with that string with a `>', and may retain the
smtp envelope as return-path and an x- headers. but even that's not
universal, e.g., solaris doesn't write a `from ' line instead it writes a
content-length header which is exactly correct.
>This request's subject is the deletion of messages in the file "mbox".
>1. Some programmes provide the prompt/command line as their
>programming interface.
that is a human interface. a programming interface is typically present in
the form of scripts manipulating variables and calling functions.
libraries are available that can manipulate various mailboxes formats,
e.g., c-client from the university of washington.
>I encounter the difficulty to scroll through all exististing messages
>by a command to find the right number for the message that should be
>deleted. I am looking for an efficient algorithm.
there is no particularly efficient algorithm, you must read the entire
file, collecting all the sensible headers (e.g., from, subject) as you
progress. to make future manipulation more efficient you can record the
position within the file of each message.
>3. A method for the deletion seems to be missing. I come to a
>fundamental programming problem now.
>How can a piece be cut out from the file "mbox" that can be huge in
>size and that is accessed rapidly and perhaps without breaks while new
>messages are appended to the file end?
nothing can be done until you have your system's locking mechanism
implemented.
let me reiterate that: you cannot manipulate an mbox-style mailbox without
having obtained a proper lock. and there is no universal lock, which
always works, each system has it's own notion of what is appropriate. once
you have the lock no other changes will be allowed, i.e., no new messages
will be added by the lda and no other processes should be manipulating it.
this is one of the reasons that other mailbox formats (e.g., mh, maildir
and mbx) have been created.
once the mailbox is locked you can use a shuffle loop (seek/read/seek/write)
to copy the portion of the mailbox that follows the deleted message to it's
new position (over the top of the deleted message), truncating the file to
the new length. if there are multiple messages to be deleted then you can
optimize this by doing all the messages in a single pass.
now let me back-pedal some: if you are only deleting messages then since
you will only be moving messages `up' in the file you can risk doing so
without locking the file, but you must take great care to: re-validate the
seek positions if you saved them during the initial scan and abort if there
is any change (because another copy of your program is running), and create
a dummy message at the end of the file that *exactly* consumes the space
you eliminate and check for a size increase before you change the file
length. if the file increased in size then a new message or messages have
arrived which you'll have to shuffle into place (i.e., repeat these steps).
>(The programme "mailx" has got a built-in solution.
>3.1 I do not know yet how to put the ends of the gap together again.
perhaps scripting interaction with mailx is what you want to do then.
>3.2 The file can always be written (without the deleted contents) to a
>new version. The updated version replaces the old one. Is this
>applicable if you think about speed?
it can be acceptable, on good systems. but if the mailbox is huge it's
better to use a different format (mh, maildir or mbx). in fact i would be
tempted to do it all via the imap, provided such a service can be enabled.
--
bringing you boring signatures for 17 years
:The deletion can be performed manually by typing the letter "d" with a
:following message number inside the programmes "mail". (This step can
:be performed by a script, too.)
:
:I encounter the difficulty to scroll through all exististing messages
:by a command to find the right number for the message that should be
:deleted. I am looking for an efficient algorithm.
I don't believe that the format that the physical storage of mail messages
within a file is documented in a standard. Neither, unfortunately, is
the protocol for interacting with such a file. This leaves one wanting
to do things with the box in a quandry. First, one has no guaranteed way
of knowing how the messages are stored - are they merely a series of
messages appended one to another, with perhaps some sort of initial line
indicating something about the message (the most frequent case in open source
software at least)? Does the format consist, instead of seperator lines,
of the software making use of some particular header (under System V, some
mailers make use of a content-length header to determine where the end of a
msg occurred)? Or does the file format consist of some sort of lead in
index? Or, as is the case for some user mail agent packages, does each message
become stored in its own file?
The second problem is even worse, in my opinion. That is the factor of file
locking. Each mail program is free to (and appears to) make use of what
its author(s) deem as most useful file locking. Of course, if one's
mail files are on NFS, then some file locking schemes are less successful
than others. But if an additional utility to, say, delete messages, were
to be written, the author needs to account for the locking styles of ANY
program which might update that box, so as not to lose messages.
If the above two problems were solvable, then the method to 'delete'
a message would be to write to a new file all the messages before, and after,
the message in question to a file. The author would probably open the
original file for exclusive access, read in the data and write out the
remaining messages to a new file, then somehow rename the two files in
a way that would minimize the window one might lose a message.
It is a non-trivial task to do right.
:3.2 The file can always be written (without the deleted contents) to a
:new version. The updated version replaces the old one. Is this
:applicable if you think about speed?
Unfortunately, there are no options, at least on Unix and given the
above restrictions.
Besides storing each message in a seperate file, another option might be
to store emails in a database (ala Outlook, etc.)
--
Tcl - The glue of a new generation. <URL: http://wiki.tcl.tk/ >
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.
<URL: mailto:lvi...@yahoo.com > <URL: http://www.purl.org/NET/lvirden/ >
Indeed, this is sufficiently nasty that I'd suggest doing everything using POP
or IMAP if the mailbox is accessible via that route! At least then you could
use the assumption that someone has configured the delivery software and mailbox
access software to work together in terms of details like locking. (If they
haven't... <FX: shiver of fear and loathing>) Remote mail access protocols may
well suck, but at least they are defined protocols...
Donal.
--
Donal K. Fellows http://www.cs.man.ac.uk/~fellowsd/ donal....@man.ac.uk
-- Thanks, but I only sleep with sentient lifeforms. Anything else is merely
a less sanitary form of masturbation.
-- Alistair J. R. Young <avatar...@arkane.demon.co.uk>
- Thanks for your good explanation.
> lvi...@yahoo.com wrote:
> > According to Markus Elfring <Markus....@web.de>:
> >: This request's subject is the deletion of messages in the file "mbox".
> > The second problem is even worse, in my opinion. That is the factor of file
> > locking. Each mail program is free to (and appears to) make use of what
> > its author(s) deem as most useful file locking. Of course, if one's
> > mail files are on NFS, then some file locking schemes are less successful
> > than others. But if an additional utility to, say, delete messages, were
> > to be written, the author needs to account for the locking styles of ANY
> > program which might update that box, so as not to lose messages.
>
> Indeed, this is sufficiently nasty that I'd suggest doing everything using POP
> or IMAP if the mailbox is accessible via that route! At least then you could
If someone has tcl code handling the IMAP protocol please consider
contributing this to tcllib.
--
Sincerely,
Andreas Kupries <akup...@shaw.ca>
Developer @ <http://www.activestate.com/>
Private <http://www.purl.org/NET/akupries/>
-------------------------------------------------------------------------------
}
>This request's subject is the deletion of messages in the file "mbox".
I think the safest approach would be to go through POP. For Perl, see
the modules Mail::POP3Client and Net::POP3. Search for methods with a
name like "delete" (not necessarily with that case).
--
Bart.
>[fu-t set]
>
>in comp.programming i read:
>
>>The term "SMTP mail" that I've chosen specifies a message according to
>>the standard "RFC 822" (http://ietf.org/rfc/rfc0822.txt). This format
>>can be send over the communication protocol "SMTP" and some others.
>
>rfc 2822 is the current standard. almost no system stores that format
>unmodified, e.g., the unix mbox format prefixes it with a `from ' line and
>prefixes lines that start with that string with a `>', and may retain the
>smtp envelope as return-path and an x- headers. but even that's not
>universal, e.g., solaris doesn't write a `from ' line instead it writes a
>content-length header which is exactly correct.
>
Here the case makes a difference: The message delimiter start with "From
" not "from ". Solaris uses the SVR4 style with Content-Length but also
includes the "From " delimiter, and that may create problems when naive
programs accesses these mailboxes thinking it is the old unix mailbox,
and therby ignore the Content-Length and don't fix the Content-Length if
the size of the message chages by adding status flags to the mail headers.
Villy
A POP3 server will use whatever message base it's been programmed to use.
Bill, over an RFC1149 gateway.
>I think that a POP3 server will not use a mbox file.
Not directly, no. You are supposed to connect to a POP3 server on your
machine, which will do the manipulation for youi.
--
Bart.
Now that depends on the configuration of the POP3 server, yes? OK, so it
probably won't use a *local* mbox file. ;^)
If you have a local mbox file that is not being delivered to, then all you've
got is a fairly trivial parse-into-list,remove-list-element,list-back-to-file
problem, and all you need to know is the format of the list (messages start with
a line matching "^From " in all mbox implementations I've ever seen, though you
should heed the advice given elsewhere on this thread.)
If you've got something that is a delivery target, you *must* match yourself
with the locking protocol or you run the risk of losing mail. (Sometimes you
can also use hard-links to move the file out of the way atomically, but that's
not generally applicable...) If someone's got a POP3 server set up[*] (to allow
for remote access to mail) then that'd work really well. Another alternative is
to use the mailx MUA via a pipe, again because it should already know the right
locking protocol.
Donal (leveraging someone else's work is *the* way to go! :^)
[* Assuming they've matched locking protocols, but it'd be really incompetent
for a system installer or admin to foul that up. ]
--
Donal K. Fellows http://www.cs.man.ac.uk/~fellowsd/ donal....@man.ac.uk
-- The small advantage of not having California being part of my country would
be overweighed by having California as a heavily-armed rabid weasel on our
borders. -- David Parsons <o r c @ p e l l . p o r t l a n d . o r . u s>
[...]
> This request's subject is the deletion of messages in the file "mbox".
> 1. Some programmes provide the prompt/command line as their
> programming interface.
> The deletion can be performed manually by typing the letter "d" with a
> following message number inside the programmes "mail". (This step can
> be performed by a script, too.)
>
> I encounter the difficulty to scroll through all exististing messages
> by a command to find the right number for the message that should be
> deleted. I am looking for an efficient algorithm.
[...]
> A problem detail is also to acquire a lock for the write access to
> update the file.
preenmail is made for such things and many others. Examine the README
at ftp://ftp.rpi.edu/home/89/sofkam/public/preenmail
tony
--
use hotmail.com for any email replies
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----
>>rfc 2822 is the current standard. almost no system stores that format
>>unmodified, e.g., the unix mbox format prefixes it with a `from ' line
>Here the case makes a difference: The message delimiter start with "From
>" not "from ".
yes, i should have been more careful about that -- thanks.
Do you really suggest the "Standard for the Transmission of IP
Datagrams on Avian Carriers" as a gateway service? ;-)
- http://ietf.org/rfc/rfc1149.txt
- Let us perform the POP3 protocol with birds
http://www.blug.linux.no/rfc1149/
"Avian carriers", "birds", "penguins". What, are we going to convert the
whole internet from the usual IP to one based on homing pigeons? ;-)
/Al