Email Schema?

303 views
Skip to first unread message

Ross Schulman

unread,
Jan 2, 2015, 10:17:23 PM1/2/15
to camli...@googlegroups.com
Is there an agreed-upon schema for emails stored in Camlistore? I'm in the process of writing a script that will be called by postfix on my VPS whenever I get an email in addition to delivering it to my normal mail store (for now at least, until I can get an entire email infrastructure built up around Camlistore). I'd like to make sure I insert the emails in a community agreed upon way before I go down that route, though.

I've seen some of the work that was being done a while back on a gmail importer, but it doesn't seem as if anything came of it.

Is there a schema out there or should I throw up a strawman to see where we end up?

-Ross

Brad Fitzpatrick

unread,
Jan 2, 2015, 10:23:54 PM1/2/15
to camli...@googlegroups.com
There was a discussion and a doc sent around earlier but it was never finished.  


If you're interested in working on this, I'll try to prioritize coming up with a plan.




-Ross

--
You received this message because you are subscribed to the Google Groups "Camlistore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to camlistore+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ross Schulman

unread,
Jan 2, 2015, 10:31:26 PM1/2/15
to camli...@googlegroups.com
I did see that. I had two concerns with it (not dealbreakers, just concerns):
1) It seemed overly focused on importing from Gmail, while Gmail is not a universal standard. I would think it would be better to focus on a schema for "email-in-general" not just "email-qua-gmail"
2) It focused on being an importer built in to camlistore which is not in and of itself a problem, but I wonder if we're focusing overly much on importers that run inside the walls rather than having a robust API that other programs can hook into. I guess its the difference between the unix way and other alternatives to that.

Brad Fitzpatrick

unread,
Jan 2, 2015, 11:20:05 PM1/2/15
to camli...@googlegroups.com
I have no desire at all to do gmail only.  I have tons of pre-gmail email.  And even each gmail message has a normal RFC 822-ish message behind it, so we'd use the normal email schema even for gmail.


Fabian

unread,
Jan 30, 2015, 1:38:50 PM1/30/15
to camli...@googlegroups.com, br...@danga.com
First of all apologies for not finding the time to continue this back then. 
However, I wrote down some specifics about storing the raw email:


There's probably no point to use the gmail API at all as we a) want to store the raw email anyway and b) don't want to limit ourselves to gmail.
I think storing base64 contents in their original encoding is worth the effort. Finding some images or pdfs we received some time ago and seeing them as such in the UI is certainly a nice feature.

Storing the raw mail is certainly the most important issue. There are lots of options for denormalisation and referencing.
Feedback for both would be great.

Ivan Vučica

unread,
Feb 1, 2015, 4:12:01 AM2/1/15
to camli...@googlegroups.com, br...@danga.com

"There's probably no point to use the gmail API at all"

I interpret the above as "no point to use Gmail extensions to IMAP".

I don't agree, as one could import just the "all mail" folder and read the labels using Gmail's extension to IMAP, which would be nicer than just importing all folders.

Additionally, OAuth is really nice for authentication, but Gmail-specific.

Surely you didn't mean avoiding these Gmailisms in case the server supports them is good, and you meant that the proposed schema should not /depend/ on them (which I agree with)?

Fabian

unread,
Feb 1, 2015, 5:04:16 AM2/1/15
to camli...@googlegroups.com, br...@danga.com
I should have elaborated on that statement as your interpretation is quite the opposite of what I meant.
I think that there is no point in using the Gmail API, since the relevant features (including OAuth) are all available via IMAP as well. So improving a default IMAP importer by using these seems more reasonable than using the API on top or writing an additional importer.

Ivan Vučica

unread,
Feb 2, 2015, 4:40:18 PM2/2/15
to camli...@googlegroups.com, br...@danga.com

Thanks for clarification! :)

Phil Mocek

unread,
Feb 12, 2015, 6:04:47 PM2/12/15
to camli...@googlegroups.com
Brad Fitzpatrick wrote:
> There was a discussion and a doc sent around earlier but it was never
> finished.
>
> The doc was
> https://docs.google.com/document/d/1pXJxjHEz2yXxJ8MYmft7ujjhWmaH30LCrBdjCAKgwMU/edit?usp=sharing
>
> If you're interested in working on this, I'll try to prioritize coming up
> with a plan.

I'm interested in this discussion. A few searches of the list archives
I have and of the Web available didn't turn up past discussion on this
list. Where can I read it?

It would be disappointing not to end up with the original RFC 2822
messages after storing in and retrieving from Camlistore. It's a simple
and clearly-defined format that has not changed much for decades (RFC
822) and not at all since 2001 (RFC 2822). Skimming the document Brad
referenced, I'm concerned that this will not be the case.

An Internet e-mail message is just a set of key-value pairs (the header
fields; some required, some optional) and a payload of arbitrary text
(the body). Most any software written to work with such messages knows
how to read multipart MIME bodies, and when a message is of type
multipart-alternative, to present to a user whichever alternative is
likely the user's preference. Breaking a message body into multiple
parts for storage seems an odd choice.

The specialization for Gmail seems rational for a Gmail importer
(assuming that IMAP is undesirable as the method of retrieval), but not
for the storage of messages once they've been retrieved.

--
Phil Mocek
https://mocek.org

Fabian

unread,
Feb 12, 2015, 6:25:35 PM2/12/15
to camli...@googlegroups.com, phil-...@mocek.org
Hi Phil,

I completely agree with you. This first document was written from a wrong point of view and is basically useless. In the issue tracker I posted a new one:

Of course it is entirely possible to simply store the raw email as a single file and be done with it. However, I think having the raw email available as well as attachments showing up as regular files I own is a decent feature. So that an attached image actually appears among my images, e.g., in the UI. In that sense I don't consider it odd to split a multipart body into its parts.
I did a testing implementation for the way described in this document which indicates that we can store the raw emails (byte-exact for DKIM support as pointed out by Brad) and store the base64 contents decoded for all remotely correct emails (and fallback for the few malformed ones.)

Feedback on this document would be appreciated.

Ivan Vučica

unread,
Feb 12, 2015, 7:25:08 PM2/12/15
to camli...@googlegroups.com, phil-...@mocek.org
On Thu, Feb 12, 2015 at 11:25 PM, Fabian <fab.re...@gmail.com> wrote:
However, I think having the raw email available as well as attachments showing up as regular files I own is a decent feature. So that an attached image actually appears among my images, e.g., in the UI. In that sense I don't consider it odd to split a multipart body into its parts.

I'd personally highly prefer to just have the original email body stored, as it is delivered to the server. Indexer, (hypothetical) Camlistore-backed IMAP server and the mail clients should be the ones dealing with peeking deeper into the multipart emails.

Regarding storing 'title', 'to' et al as attributes in the permanode, present in the original proposal and in Fabian's doc: 
Please excuse my possible OTOH lack of understanding of The Ways of Camlistore, but while it seems reasonable, would that not needlessly create a bunch of 'mutation' blobs for each individual attribute, each of which is based on a header that is and should be essentially static and immutable? That is, would the importer then behave similarly to the Twitter importer which creates an enormous amount of blobs? Is this required for easier indexing?

And to provoke some chatter: 
Would it make sense to instead just store emails as a bunch of .eml files (including, if that is required for easy indexing, extra attributes on a permanode, like I was wondering above)? Would it be possible to provide 'import as .eml only' as an option? I generally find tagging, nice indexing and blob-syncing the most important things that would be offered by Camlistore-backed server. Wouldn't storing emails as rfc(2)822 .eml-formatted files be reasonable for that?
--
Ivan Vučica

Phil Mocek

unread,
Feb 12, 2015, 8:08:04 PM2/12/15
to camli...@googlegroups.com
Ivan Vučica wrote:
> Would it make sense to instead just store emails as a bunch of .eml files

Eek! Please, no proprietary formats. E-mail messages are just plain
text. There's no need to muck them up as Outlook would.

Ivan Vučica

unread,
Feb 12, 2015, 8:09:21 PM2/12/15
to camli...@googlegroups.com
Uh -- proprietary? But .eml is just plaintext rfc(2)822, no?

--
Ivan Vučica

Tamás Gulácsi

unread,
Feb 13, 2015, 9:08:10 AM2/13/15
to camli...@googlegroups.com

I think Phil meant Outlook-only .msg files. And yes, they're awful. And the most awful is that from Outlook I can't get out the original ASCII email.
Which is what a simple .eml file is.

Stephen Searles

unread,
Apr 11, 2016, 12:51:37 AM4/11/16
to Camlistore
I've been thinking about email and camlistore lately. Are there any recent thoughts on this topic? I played around with a bit of code considering something like an smtp server interfacing with camlistore to store and retrieve messages. So far I just played with interacting with net/mail and the camlistore cli tools.

I was wondering about an importer for generic email, but I got a little stuck on if that'd be practical for supporting real email UIs. Correct me if I'm wrong: are importers able to only run periodically? I think that could still work, but only if it's kosher that an importer can, say, spin off a few goroutines and serve on the smtp port. For pushing content into the actual data store, it could still run periodically, just fairly frequently. (minute or less maybe?) What's the thought there?

As far as schema, I missed a couple of the docs here til now, but what I ended up with looks kinda like a messier version of Brad's doc, structurally speaking, but focusing on email as a standard format.

Tamás Gulácsi

unread,
Apr 11, 2016, 1:47:00 AM4/11/16
to Camlistore
2016. április 11., hétfő 6:51:37 UTC+2 időpontban Stephen Searles a következőt írta:
I've been thinking about email and camlistore lately. Are there any recent thoughts on this topic? I played around with a bit of code considering something like an smtp server interfacing with camlistore to store and retrieve messages. So far I just played with interacting with net/mail and the camlistore cli tools.

I was wondering about an importer for generic email, but I got a little stuck on if that'd be practical for supporting real email UIs. Correct me if I'm wrong: are importers able to only run periodically? I think that could still work, but only if it's kosher that an importer can, say, spin off a few goroutines and serve on the smtp port. For pushing content into the actual data store, it could still run periodically, just fairly frequently. (minute or less maybe?) What's the thought there?

As far as schema, I missed a couple of the docs here til now, but what I ended up with looks kinda like a messier version of Brad's doc, structurally speaking, but focusing on email as a standard format.

An email importer would slurp all the emails from the configured IMAP server's configured mailbox - so you could put all emails you want to keep in an "Archive" folder, and Camlistore would just slurp&store all of them.

An SMTP server should be a separate program (at least a Camlistore "app") that listens as SMTP server and shovels everything it gets onto Camlistore.
If you really want, you can peer it with an IMAP server that serves from Camlistore, so you can manage all your stored emails with any IMAP-capable email client!

Stephen Searles

unread,
Apr 11, 2016, 12:07:30 PM4/11/16
to Camlistore
Awesome. That's kind of what I suspected and why I started with the cli tools. Is that the recommended interface for "apps" like this? Or is there an http api or something?

--
You received this message because you are subscribed to a topic in the Google Groups "Camlistore" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/camlistore/MvkwC7qLMm8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to camlistore+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages