The term "SMTP mail" that I've chosen specifies a message according to the standard "RFC 822" (http://ietf.org/rfc/rfc0822.txt). This format can be send over the communication protocol "SMTP" and some others.
Excerpt from the book "Programming Internet Email" - Chapter 4 "MIME-compliant messages", Page 60: "... the simplest MIME-compliant message is just an RFC 822 simple text message with an added MIME-Version header. ..."
This request's subject is the deletion of messages in the file "mbox". 1. Some programmes provide the prompt/command line as their programming interface. The deletion can be performed manually by typing the letter "d" with a following message number inside the programmes "mail". (This step can be performed by a script, too.)
I encounter the difficulty to scroll through all exististing messages by a command to find the right number for the message that should be deleted. I am looking for an efficient algorithm.
2. I've spot that the "landscape" of comfortable programming interfaces has improved. 2.1 PHP http://pear.php.net/packages.php?catpid=14&catname=Mail What a pity - The package "mailparse" has got the state "0.9.1 beta" und the package "Mail_Mbox" has got the state "0.1.5 alpha".
2.3 Perl http://search.cpan.org/modlist/Mail_and_Usenet_News/Mail/ The package "Mail::Box" (Mail folder manager and MUA backend, http://perl.overmeer.net/mailbox/) has got the state "2.034 beta". The package "Mail::MboxParser" (read-only access to UNIX-mailboxes) has got the state "0.38". The package "Mail::Field" (Base class for manipulation of mail header fields) has got the state "1.58".
2.6 Java http://java.sun.com/products/javamail/ The package "JavaMail" has got the state "1.3, 26. Juni 2002". Is a protocol provider for the format "Mbox" available?
3. A method for the deletion seems to be missing. I come to a fundamental programming problem now. How can a piece be cut out from the file "mbox" that can be huge in size and that is accessed rapidly and perhaps without breaks while new messages are appended to the file end? (The programme "mailx" has got a built-in solution. I don't want to look at the source files.)
3.1 I do not know yet how to put the ends of the gap together again.
3.2 The file can always be written (without the deleted contents) to a new version. The updated version replaces the old one. Is this applicable if you think about speed? A problem detail is also to acquire a lock for the write access to update the file.
>The term "SMTP mail" that I've chosen specifies a message according to >the standard "RFC 822" (http://ietf.org/rfc/rfc0822.txt). This format >can be send over the communication protocol "SMTP" and some others.
>Excerpt from the book "Programming Internet Email" - Chapter 4 >"MIME-compliant messages", Page 60: >"... >the simplest MIME-compliant message is just an RFC 822 simple text >message with an added MIME-Version header. >..."
>This request's subject is the deletion of messages in the file "mbox". >1. Some programmes provide the prompt/command line as their >programming interface. >The deletion can be performed manually by typing the letter "d" with a >following message number inside the programmes "mail". (This step can >be performed by a script, too.)
>I encounter the difficulty to scroll through all exististing messages >by a command to find the right number for the message that should be >deleted. I am looking for an efficient algorithm.
>3. A method for the deletion seems to be missing. I come to a >fundamental programming problem now. >How can a piece be cut out from the file "mbox" that can be huge in >size and that is accessed rapidly and perhaps without breaks while new >messages are appended to the file end? >(The programme "mailx" has got a built-in solution. I don't want to >look at the source files.)
>3.1 I do not know yet how to put the ends of the gap together again.
>3.2 The file can always be written (without the deleted contents) to a >new version. The updated version replaces the old one. Is this >applicable if you think about speed? >A problem detail is also to acquire a lock for the write access to >update the file.
>The term "SMTP mail" that I've chosen specifies a message according to >the standard "RFC 822" (http://ietf.org/rfc/rfc0822.txt). This format >can be send over the communication protocol "SMTP" and some others.
rfc 2822 is the current standard. almost no system stores that format unmodified, e.g., the unix mbox format prefixes it with a `from ' line and prefixes lines that start with that string with a `>', and may retain the smtp envelope as return-path and an x- headers. but even that's not universal, e.g., solaris doesn't write a `from ' line instead it writes a content-length header which is exactly correct.
>This request's subject is the deletion of messages in the file "mbox". >1. Some programmes provide the prompt/command line as their >programming interface.
that is a human interface. a programming interface is typically present in the form of scripts manipulating variables and calling functions. libraries are available that can manipulate various mailboxes formats, e.g., c-client from the university of washington.
>I encounter the difficulty to scroll through all exististing messages >by a command to find the right number for the message that should be >deleted. I am looking for an efficient algorithm.
there is no particularly efficient algorithm, you must read the entire file, collecting all the sensible headers (e.g., from, subject) as you progress. to make future manipulation more efficient you can record the position within the file of each message.
>3. A method for the deletion seems to be missing. I come to a >fundamental programming problem now. >How can a piece be cut out from the file "mbox" that can be huge in >size and that is accessed rapidly and perhaps without breaks while new >messages are appended to the file end?
nothing can be done until you have your system's locking mechanism implemented.
let me reiterate that: you cannot manipulate an mbox-style mailbox without having obtained a proper lock. and there is no universal lock, which always works, each system has it's own notion of what is appropriate. once you have the lock no other changes will be allowed, i.e., no new messages will be added by the lda and no other processes should be manipulating it. this is one of the reasons that other mailbox formats (e.g., mh, maildir and mbx) have been created.
once the mailbox is locked you can use a shuffle loop (seek/read/seek/write) to copy the portion of the mailbox that follows the deleted message to it's new position (over the top of the deleted message), truncating the file to the new length. if there are multiple messages to be deleted then you can optimize this by doing all the messages in a single pass.
now let me back-pedal some: if you are only deleting messages then since you will only be moving messages `up' in the file you can risk doing so without locking the file, but you must take great care to: re-validate the seek positions if you saved them during the initial scan and abort if there is any change (because another copy of your program is running), and create a dummy message at the end of the file that *exactly* consumes the space you eliminate and check for a size increase before you change the file length. if the file increased in size then a new message or messages have arrived which you'll have to shuffle into place (i.e., repeat these steps).
>(The programme "mailx" has got a built-in solution. >3.1 I do not know yet how to put the ends of the gap together again.
perhaps scripting interaction with mailx is what you want to do then.
>3.2 The file can always be written (without the deleted contents) to a >new version. The updated version replaces the old one. Is this >applicable if you think about speed?
it can be acceptable, on good systems. but if the mailbox is huge it's better to use a different format (mh, maildir or mbx). in fact i would be tempted to do it all via the imap, provided such a service can be enabled.
According to Markus Elfring <Markus.Elfr...@web.de>: :This request's subject is the deletion of messages in the file "mbox".
:The deletion can be performed manually by typing the letter "d" with a :following message number inside the programmes "mail". (This step can :be performed by a script, too.) : :I encounter the difficulty to scroll through all exististing messages :by a command to find the right number for the message that should be :deleted. I am looking for an efficient algorithm.
I don't believe that the format that the physical storage of mail messages within a file is documented in a standard. Neither, unfortunately, is the protocol for interacting with such a file. This leaves one wanting to do things with the box in a quandry. First, one has no guaranteed way of knowing how the messages are stored - are they merely a series of messages appended one to another, with perhaps some sort of initial line indicating something about the message (the most frequent case in open source software at least)? Does the format consist, instead of seperator lines, of the software making use of some particular header (under System V, some mailers make use of a content-length header to determine where the end of a msg occurred)? Or does the file format consist of some sort of lead in index? Or, as is the case for some user mail agent packages, does each message become stored in its own file?
The second problem is even worse, in my opinion. That is the factor of file locking. Each mail program is free to (and appears to) make use of what its author(s) deem as most useful file locking. Of course, if one's mail files are on NFS, then some file locking schemes are less successful than others. But if an additional utility to, say, delete messages, were to be written, the author needs to account for the locking styles of ANY program which might update that box, so as not to lose messages.
If the above two problems were solvable, then the method to 'delete' a message would be to write to a new file all the messages before, and after, the message in question to a file. The author would probably open the original file for exclusive access, read in the data and write out the remaining messages to a new file, then somehow rename the two files in a way that would minimize the window one might lose a message.
It is a non-trivial task to do right.
:3.2 The file can always be written (without the deleted contents) to a :new version. The updated version replaces the old one. Is this :applicable if you think about speed?
Unfortunately, there are no options, at least on Unix and given the above restrictions.
Besides storing each message in a seperate file, another option might be to store emails in a database (ala Outlook, etc.)
-- Tcl - The glue of a new generation. <URL: http://wiki.tcl.tk/ > Even if explicitly stated to the contrary, nothing in this posting should be construed as representing my employer's opinions. <URL: mailto:lvir...@yahoo.com > <URL: http://www.purl.org/NET/lvirden/ >
lvir...@yahoo.com wrote: > According to Markus Elfring <Markus.Elfr...@web.de>: >: This request's subject is the deletion of messages in the file "mbox". > The second problem is even worse, in my opinion. That is the factor of file > locking. Each mail program is free to (and appears to) make use of what > its author(s) deem as most useful file locking. Of course, if one's > mail files are on NFS, then some file locking schemes are less successful > than others. But if an additional utility to, say, delete messages, were > to be written, the author needs to account for the locking styles of ANY > program which might update that box, so as not to lose messages.
Indeed, this is sufficiently nasty that I'd suggest doing everything using POP or IMAP if the mailbox is accessible via that route! At least then you could use the assumption that someone has configured the delivery software and mailbox access software to work together in terms of details like locking. (If they haven't... <FX: shiver of fear and loathing>) Remote mail access protocols may well suck, but at least they are defined protocols...
Donal. -- Donal K. Fellows http://www.cs.man.ac.uk/~fellowsd/ donal.fell...@man.ac.uk -- Thanks, but I only sleep with sentient lifeforms. Anything else is merely a less sanitary form of masturbation. -- Alistair J. R. Young <avatar-use...@arkane.demon.co.uk>
"Donal K. Fellows" <donal.k.fell...@man.ac.uk> writes:
> lvir...@yahoo.com wrote: > > According to Markus Elfring <Markus.Elfr...@web.de>: > >: This request's subject is the deletion of messages in the file "mbox". > > The second problem is even worse, in my opinion. That is the factor of file > > locking. Each mail program is free to (and appears to) make use of what > > its author(s) deem as most useful file locking. Of course, if one's > > mail files are on NFS, then some file locking schemes are less successful > > than others. But if an additional utility to, say, delete messages, were > > to be written, the author needs to account for the locking styles of ANY > > program which might update that box, so as not to lose messages.
> Indeed, this is sufficiently nasty that I'd suggest doing everything using POP > or IMAP if the mailbox is accessible via that route! At least then you could
If someone has tcl code handling the IMAP protocol please consider contributing this to tcllib.
Markus Elfring wrote: >This request's subject is the deletion of messages in the file "mbox".
I think the safest approach would be to go through POP. For Perl, see the modules Mail::POP3Client and Net::POP3. Search for methods with a name like "delete" (not necessarily with that case).
>>The term "SMTP mail" that I've chosen specifies a message according to >>the standard "RFC 822" (http://ietf.org/rfc/rfc0822.txt). This format >>can be send over the communication protocol "SMTP" and some others.
>rfc 2822 is the current standard. almost no system stores that format >unmodified, e.g., the unix mbox format prefixes it with a `from ' line and >prefixes lines that start with that string with a `>', and may retain the >smtp envelope as return-path and an x- headers. but even that's not >universal, e.g., solaris doesn't write a `from ' line instead it writes a >content-length header which is exactly correct.
Here the case makes a difference: The message delimiter start with "From " not "from ". Solaris uses the SVR4 style with Content-Length but also includes the "From " delimiter, and that may create problems when naive programs accesses these mailboxes thinking it is the old unix mailbox, and therby ignore the Content-Length and don't fix the Content-Length if the size of the message chages by adding status flags to the mail headers.
Markus Elfring wrote: > I think that a POP3 server will not use a mbox file.
Now that depends on the configuration of the POP3 server, yes? OK, so it probably won't use a *local* mbox file. ;^)
If you have a local mbox file that is not being delivered to, then all you've got is a fairly trivial parse-into-list,remove-list-element,list-back-to-file problem, and all you need to know is the format of the list (messages start with a line matching "^From " in all mbox implementations I've ever seen, though you should heed the advice given elsewhere on this thread.)
If you've got something that is a delivery target, you *must* match yourself with the locking protocol or you run the risk of losing mail. (Sometimes you can also use hard-links to move the file out of the way atomically, but that's not generally applicable...) If someone's got a POP3 server set up[*] (to allow for remote access to mail) then that'd work really well. Another alternative is to use the mailx MUA via a pipe, again because it should already know the right locking protocol.
Donal (leveraging someone else's work is *the* way to go! :^) [* Assuming they've matched locking protocols, but it'd be really incompetent for a system installer or admin to foul that up. ] -- Donal K. Fellows http://www.cs.man.ac.uk/~fellowsd/ donal.fell...@man.ac.uk -- The small advantage of not having California being part of my country would be overweighed by having California as a heavily-armed rabid weasel on our borders. -- David Parsons <o r c @ p e l l . p o r t l a n d . o r . u s>
> This request's subject is the deletion of messages in the file "mbox". > 1. Some programmes provide the prompt/command line as their > programming interface. > The deletion can be performed manually by typing the letter "d" with a > following message number inside the programmes "mail". (This step can > be performed by a script, too.)
> I encounter the difficulty to scroll through all exististing messages > by a command to find the right number for the message that should be > deleted. I am looking for an efficient algorithm. [...] > A problem detail is also to acquire a lock for the write access to > update the file.
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 80,000 Newsgroups - 16 Different Servers! =-----
>On 19 Jan 2003 08:08:30 GMT, Someone wrote: >>rfc 2822 is the current standard. almost no system stores that format >>unmodified, e.g., the unix mbox format prefixes it with a `from ' line >Here the case makes a difference: The message delimiter start with "From >" not "from ".
yes, i should have been more careful about that -- thanks.