Perform actions on all messages of a Mailbox

45 views
Skip to first unread message

D David

unread,
Oct 17, 2016, 9:52:19 AM10/17/16
to JMAP
Hi all,

I was investigating a way for a JMAP client to empty a folder (the usual "Empty Trash" scenario) and it turned out there's nothing in the spec allowing me to do this.
The only possible solution would be to fetch Message IDs with a call to getMessageList and then destroy all these IDs with a call to setMessages. Is my understanding right or am I missing something?

If I'm right, Linagora would like to propose a simple but effective way to implement such "mass" actions on all messages in a given Mailbox: the setFilteredMessages API. This API will be very similar to setMessages but will apply update and/or destroy operations on messages matching a filter (the filter would behave exactly as the filter option of getMessageList: it will select messages matching the filter expression). Two examples below:
  • Empty a folder
setMessages({
  filter
: {
    inMailboxes
: ['<my-mailbox-id>']
 
},
  destroy
: true
})
  • Mark all messages from me as read
setMessages({
  filter
: {
   
from: 'ddo.li...@gmail.com'
 
},
  update
: {
    isUnread
: false
 
}
})

Some restrictions would apply, mainly:
  • destroy MUST be a Boolean value. If True, matched messages are permanently destroyed.
  • update MUST be an object containing updated properties. Ther server MUST apply the modifications to the matched messages.

What do you all think? Once we agree on the proposal, I can prepare a PR to the spec quickly.

Thanks,
Regards,

David

Bron Gondwana

unread,
Oct 17, 2016, 6:05:05 PM10/17/16
to jmap-d...@googlegroups.com
This is a very sharp tool!  Particularly since a common programming error would be to mess up the filter and send something which matched every message on the server.  This is the equivalent of "DELETE FROM Messages;" via SQL.  I'd want a few things for safety:

1) a return code which says "filterTooBroad" or something if you send an empty filter.

2) a strong suggestion in the implementation docs that you use ifInState to avoid race conditions with new things being delivered that you didn't mean to include.

...

Having said that, I'm generally in favour of protocols allowing you to express your intent, and "delete everything in my Trash folder" is a common user intent.  As is "move all the messages matching this expression into a folder", so:


setMessages({
  filter
: {
   
from: 'br...@fastmail.fm'
 
},
  update
: {
    folderId
: "xyz",
    isUnread: false,
 
}
})

Yep, I could see that being useful.  It would blow your cache pretty badly with the large getMessageUpdates in response, but a client that didn't care about updates could do bulk operations without having to know all the IDs.

From a server implementation standpoint, this is quite easy to write as well, because you already have something for converting a filter into a list of IDs, and something for applying changes to messages based on ID.

Bron.
--
You received this message because you are subscribed to the Google Groups "JMAP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmap-discuss...@googlegroups.com.
To post to this group, send email to jmap-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
  Bron Gondwana


Neil Jenkins

unread,
Oct 17, 2016, 6:18:24 PM10/17/16
to JMAP Mailing List
I don't think this should be added to the JMAP spec. One of the concerns raised by a large mailbox provider we talked to was to make sure a client could be rate limited in a reasonable manner, so it can't overload the server. (We've also been careful the other way to try to ensure the client can control exactly how much data it requests from the server in one go.) Adding a command like this means the client could ask the server to do something potentially very expensive, depending on backend implementation.

Now the server could reject it if it's over a certain number of messages resulting in the query, but the exact limit will be server dependent and when it happens the client has to fallback to a different approach. Having two different implementations in the client is likely to be less tested and buggier.

In general, JMAP prefers the philosophy of explicitly telling the server what changes to make; this is often more efficient anyway if you're keeping the client cache in sync.

The approach we take to this problem (and I would recommend) is to fetch the list of ids up front (in pages if necessary), then ask the server to make the changes to them in batches (say 100 to 500 at a time), waiting for the previous request to finish before making the next one. The ids (should be) reasonably small and quick to fetch even for large folders. Fetching them up front ensures you don't process anything that arrives during the operation. By doing it a batch at a time, you can make sure you won't overload the server (and make sure the server will accept the request), and also more easily show a progress bar to the user (because the user is probably locked on the server while the changes are being made), or even interleave other requests to keep the client responsive while a large operation is happening in the background.

I think adding "setMessages with filter" to the JMAP spec (and therefore forcing all servers to implement it) would be a mistake. There's a better approach to achieve the same goal, or you can add a custom extension for your own client/server if you really want.

Neil.

Matthieu Baechler

unread,
Oct 18, 2016, 6:16:16 AM10/18/16
to JMAP
Hi Neil,


Le mardi 18 octobre 2016 00:18:24 UTC+2, Neil Jenkins a écrit :
I don't think this should be added to the JMAP spec. One of the concerns raised by a large mailbox provider we talked to was to make sure a client could be rate limited in a reasonable manner, so it can't overload the server. (We've also been careful the other way to try to ensure the client can control exactly how much data it requests from the server in one go.) Adding a command like this means the client could ask the server to do something potentially very expensive, depending on backend implementation.

Do people expect to use JMAP as the only protocol to access mailboxes ? because in IMAP, such very expensive methods are very common (expunge, modification with a very broad uid range, etc).
 

Now the server could reject it if it's over a certain number of messages resulting in the query, but the exact limit will be server dependent and when it happens the client has to fallback to a different approach. Having two different implementations in the client is likely to be less tested and buggier.
 

In general, JMAP prefers the philosophy of explicitly telling the server what changes to make; this is often more efficient anyway if you're keeping the client cache in sync.

You can do that with IfInState easily, you already know which messages will be changed client-side because you actually wrote the query.
 

The approach we take to this problem (and I would recommend) is to fetch the list of ids up front (in pages if necessary), then ask the server to make the changes to them in batches (say 100 to 500 at a time), waiting for the previous request to finish before making the next one.

It doesn't look like a great API to me. Managing deletion with client-side batch for performance purpose doesn't sound good.
I think a good implementation will consume more ressources to handle such large queries than to do it server-side based on a query.
 
The ids (should be) reasonably small and quick to fetch even for large folders. Fetching them up front ensures you don't process anything that arrives during the operation.

IfInState already covers this case, don't you think ?
 
By doing it a batch at a time, you can make sure you won't overload the server (and make sure the server will accept the request), and also more easily show a progress bar to the user (because the user is probably locked on the server while the changes are being made), or even interleave other requests to keep the client responsive while a large operation is happening in the background.

It would be easily solved with an "async" capability on requests. We already have Event Source for receiving async result. What do you think ?

[...]

Regards,

--
Matthieu Baechler

Bron Gondwana

unread,
Oct 20, 2016, 7:15:18 PM10/20/16
to jmap-d...@googlegroups.com
Sorry about the delay in replying to this.

On Tue, 18 Oct 2016, at 21:16, Matthieu Baechler wrote:
Hi Neil,

Le mardi 18 octobre 2016 00:18:24 UTC+2, Neil Jenkins a écrit :
I don't think this should be added to the JMAP spec. One of the concerns raised by a large mailbox provider we talked to was to make sure a client could be rate limited in a reasonable manner, so it can't overload the server. (We've also been careful the other way to try to ensure the client can control exactly how much data it requests from the server in one go.) Adding a command like this means the client could ask the server to do something potentially very expensive, depending on backend implementation.

Do people expect to use JMAP as the only protocol to access mailboxes ? because in IMAP, such very expensive methods are very common (expunge, modification with a very broad uid range, etc). 

Yes, they are.  They're also not undoable, and risky.  We have a self service "restore from Backup" tool at FastMail because "I accidentally deleted a ton of messages I didn't mean to" was a very common support request.

(that and "I store all my important email in the Trash folder because I'm insane, and I just hooked up an iPhone which wiped it", *sigh*)



Now the server could reject it if it's over a certain number of messages resulting in the query, but the exact limit will be server dependent and when it happens the client has to fallback to a different approach. Having two different implementations in the client is likely to be less tested and buggier.
 


In general, JMAP prefers the philosophy of explicitly telling the server what changes to make; this is often more efficient anyway if you're keeping the client cache in sync.

You can do that with IfInState easily, you already know which messages will be changed client-side because you actually wrote the query.
 


The approach we take to this problem (and I would recommend) is to fetch the list of ids up front (in pages if necessary), then ask the server to make the changes to them in batches (say 100 to 500 at a time), waiting for the previous request to finish before making the next one.

It doesn't look like a great API to me. Managing deletion with client-side batch for performance purpose doesn't sound good.
I think a good implementation will consume more ressources to handle such large queries than to do it server-side based on a query. 

I thought that too at first, and I initially made the same arguments.  Especially because we already had the ability to delete mailboxes and hence operate on the messages inside them.

The ids (should be) reasonably small and quick to fetch even for large folders. Fetching them up front ensures you don't process anything that arrives during the operation.

IfInState already covers this case, don't you think ?

It does, but the next getMessageUpdates will have to get a response for all those messages anyway, because you don't know if the client had cached anything for the IDs.

 
By doing it a batch at a time, you can make sure you won't overload the server (and make sure the server will accept the request), and also more easily show a progress bar to the user (because the user is probably locked on the server while the changes are being made), or even interleave other requests to keep the client responsive while a large operation is happening in the background.

It would be easily solved with an "async" capability on requests. We already have Event Source for receiving async result. What do you think ?

Thanks for raising this topic, because we did discuss it in a lot of detail, and we actually decided to go entirely the opposite direction!  Instead of deleting a mailbox causing the messages to be moved to the "inbox" role if there were messages in the mailbox, we changed it so that you can't delete a mailbox which contains messages.  You need to explicitly delete them or move them out first.

There are some strong guiding principles in JMAP, and one of them is that messages are precious and actions should be explicit.

Our own Cyrus IMAPd server has algorithms built on the assumption that the biggest single mailbox will contain one million emails.  That's pretty big.  We have around 20 users with more than that many emails total across all their mailboxes (I know because we have a 32 bit file size issue with an internal cache file when you get to about 3 million messages in a single mailbox).

So looking at a most extreme case of deleting a million emails at the same time, you're looking at one megabyte per byte of message.  A reasonable ID size is 64 bits, which is 16 hexadecimal characters.  Add in commas and quotes, you're looking at roughly 20 bytes per id.  Multiply that by a million, that's 20 megabytes of IDs to download and process.

Yes, it's a lot of data.  But 1 million emails is nearly 2 years' worth of getting one email per minute, all day, every day, and not deleting anything right up until you suddenly decide to wipe all million emails.

A more realistic number is 10,000 emails in a mailbox which is being wiped.  That's a week of one email per minute, which is about twice the rate that I get (and I get a ton of notify email and mailing lists).

10,000 IDs is 200kb of data.  Half the webpages I go to are about that size.  And that's once per week on a really busy account, where you're downloading tons more data than that just to keep up with reading a fraction of your incoming email.

So I'm not convinced that this "inefficiency" is actually a problem in practice.  The code to fetch the list of IDs and pass it back to the server isn't complex.  As Neil said, batching in groups of say 1024 messages allows you to display a progress bar as you delete the messages.  Implementing an "empty Trash" as a callback which gets all the IDs in the Trash folder and issues a delete for them all isn't much client side code, and it means that the protocol doesn't have the discontinuity.

It would be easy to implement an extension for "emptyMailbox" (which I think is the use case we're really looking for here, rather than arbitrary filter), but I've been convinced upon further examination that it shouldn't be in the base protocol.

Regards,

Bron.

--
  Bron Gondwana


Reply all
Reply to author
Forward
0 new messages