Getting a count of messages for a label "All Mail" via java API

39 views
Skip to first unread message

Tomas Hajek

unread,
Jun 6, 2011, 5:05:30 PM6/6/11
to google-app...@googlegroups.com
Can anyone suggest a way via the java API to get a total message count in a users mail store.  That is I am migrating users e-mail and would like to get a count of all the messages (so the "All Mail" label presumably) in a destination users mailbox .  I'd like to get a count before and after a migration so that I can double check the count of messages uploaded.

I'm looking now at the Email Audit API but not sure yet if I am looking in the right place.  Any one have a suggestion.

thanks,
 -Tomas

Robert Norris

unread,
Jun 6, 2011, 7:44:26 PM6/6/11
to google-app...@googlegroups.com
The best way to do it is via IMAP using XOAUTH. I don't know what client libraries are available for Java; its easy enough if you understand IMAP though (you get message count back when you SELECT the folder).

Things to be aware of:
  • All Mail does not show messages currently in the Trash
  • The Gmail "special" folders are locale-dependent, so you can't guarantee that it will always be called "[Gmail]/All Mail". Use the undocumented XLIST command to find out - it operates exactly like LIST, except that it returns addition flags to indicate what the folders are for - you want the folder with the "\AllMail" flag.
  • The "All Mail" (and all other) folders/labels can be hidden from IMAP by the user using the Advanced IMAP Settings lab (which I believe recently graduated out of Labs). There's currently no way to get around this.
Cheers,
Rob.


--
You received this message because you are subscribed to the Google Groups "Google Apps Domain Information and Management APIs" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-apps-mgmt-apis/-/WlhpbkhfZlJtTVFK.
To post to this group, send email to google-app...@googlegroups.com.
To unsubscribe from this group, send email to google-apps-mgmt...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-apps-mgmt-apis?hl=en.

Tomas Hajek

unread,
Jun 7, 2011, 8:31:35 AM6/7/11
to google-app...@googlegroups.com
Thanks Rob,
  My current interest is for initial migration purposes so I shouldn't have issue with users changing settings to hide labels or change locale.  As for the Trash, didn't know that didn't show in All Mail but currently I'm not migrating any messages into the Trash (seems pointless) so that shouldn't change my count. 
  I would imagine that since I am working as a domain admin and with accounts with those privileges I would be able to change any of the above settings via an API call as well to get around such things but I don't think that will be necessary but something to think about.
  I've never really dealt with IMAP via java but I suppose I can look into it.  I was really thinking that there would be a simple way to do this via existing API calls.  I'm trying to work this all into a semi-automated java application that will run on our e-mail messages store against directories output in maildir format via offlineImap.
-Tomas

Robert Norris

unread,
Jun 7, 2011, 7:28:36 PM6/7/11
to google-app...@googlegroups.com
I just re-read your original message and thought of something else. Gmail filters out duplicate emails (determined by the Message-id: header field) so if you upload the same message twice for some reason it will only appear once in All Mail, which will skew your numbers. You'll need to keep track of the message ids for messages you upload and make sure you're not double-counting.

You'd think this wouldn't come up often but it does, particularly with mailing lists - a user might post to a list (copy of message in Sent) and then receive a copy from the list (in the inbox or filtered to a folder).

  I would imagine that since I am working as a domain admin and with accounts with those privileges I would be able to change any of the above settings via an API call as well to get around such things but I don't think that will be necessary but something to think about.

Unfortunately there's no way to change or override the "hide label from IMAP" settings. This thread reminded me that I've been meaning to log an issue about it, which I did yesterday: http://code.google.com/a/google.com/p/apps-api-issues/issues/detail?id=2599
 
  I've never really dealt with IMAP via java but I suppose I can look into it.  I was really thinking that there would be a simple way to do this via existing API calls.  I'm trying to work this all into a semi-automated java application that will run on our e-mail messages store against directories output in maildir format via offlineImap.

I'd be very surprised if Java doesn't have an IMAP client library. The only tricky bit is sorting out the OAuth stuff, since you need an OAuth implementation that is suitably decoupled from the HTTP requests. It looks like Google already have a sample available, though I can't comment on its usefulness:


I wrote an example for Perl a while back, since that's the language I work. I suppose it won't be particularly useful, but the structure might reveal some clues if you're struggling:


Cheers,
Rob.

Tomas Hajek

unread,
Jun 8, 2011, 4:24:17 PM6/8/11
to google-app...@googlegroups.com
Hi Rob,
  Thanks for the additional information.  I actually am tracking duplicate message-ids but there must be more to it.  During my migration where I have 199 messages that aren't in the web interface in "All Mail", my duplicate message detection found 1354 unique duplicates with 3456 total duplicate message-ids which is much greater than the 199 so if it is duplicate message-id it's only getting a subset or my duplicate message-id detection is faulty. 
  I'm basically looking for the first occurrence of "Message-id:" and then the next <.*>  either on the same line as "Message-id:" or the next line while reading the files from the file system and putting that into a Hash Table as the key and then the value in the Hash Table gets updated when I find another occurrence of the same message id key. 
  I will look at my code again and see if I have something odd there.

thanks,
 -Tomas

Robert Norris

unread,
Jun 8, 2011, 4:41:40 PM6/8/11
to google-app...@googlegroups.com
If you're checking the number of messages migrated in the web interface then you're probably getting thrown off because it displays the number of conversations rather than number of messages.

There might be more to it than message ID but in my experience (170000 accounts & many millions of messages migrated) it's been enough. That said, we were never relying on our message ID tracking but only using it to reduce the amount of mail we pushed up (local disk + CPU is faster than network + API latency). If the occasional duplicate got through we trusted that Gmail would sort it out, and we weren't disappointed.




 -Tomas

--
You received this message because you are subscribed to the Google Groups "Google Apps Domain Information and Management APIs" group.

Tomas Hajek

unread,
Jun 9, 2011, 10:42:38 AM6/9/11
to google-app...@googlegroups.com
Hi Rob,
  Thanks again for the information.  I am aware of the conversion view and I turn that off when I check "All Mail" so that it should present me with the total number without skewing due to conversations.  One of the reasons that I am concerned with the duplicate detection and believe that there is more to it because if I take a message that is 100% the same (based on an md5sum) and it resides in two or more IMAP folders in my existing system and I migrate it then the result is that Google will add additional labels to the duplicate message.  However, I have seen that in some instances if the message-id was the same but the message was not 100% the same, whether due to headers or body of e-mail then in some instances it was duplicated and in others it wasn't.  Since I've done many tests and been modifying my migration program for a couple of weeks now maybe I've missed something.  I was really hoping that with the duplicate messages that someone from Google would respond to tell me a little more specifically how duplicate message detection is done and handled.  I put a support ticket in to Google but they told me to post here.  I do have another post that asked for information about duplicates but as yet I've had no response.
  I very much appreciate your feedback and input on this as well as for the issue on getting a count of messages.  If you don't mind me asking, do you know what your average migration speed was?  For examples, running a migration with a single thread on my 1856MB (76802 messages) takes me about 10 hours on average.  Which is what, about 3MB per minute?
-Tomas

Robert Norris

unread,
Jun 9, 2011, 4:42:57 PM6/9/11
to google-app...@googlegroups.com
Hi Tomas,

I'm afraid I have nothing further to offer regarding the message counts, other than to say we noticed various inconsistencies and quirks as well relating to duplicates, labels, and other things. Its been over six months since we completely our migrations so I'm sketchy on the details. I'll see if I can find anything else out about our workarounds but it sounds like you have things well in hand.

I can't remember our exact throughput, but we used to tell our customers to expect a wait of about two hours for an "average" mailbox, which was about 300MB. It usually completed much faster than that though. It should be noted that this was using the old "batch" migration API, which is 4-10x faster than the current multipart API (a fact I once confirmed with support).

Cheers,
Rob.



-Tomas

--
You received this message because you are subscribed to the Google Groups "Google Apps Domain Information and Management APIs" group.
Reply all
Reply to author
Forward
0 new messages