Parsing inbound email into an application

10 views
Skip to first unread message

Lee Irving

unread,
May 3, 2013, 5:47:43 AM5/3/13
to rubyno...@googlegroups.com
I am having great fun with parsing emails into an internal application and wondered if anyone had experience or a useful resources.

The problem is not receiving the email but in deciding how to extract the body of the email and then make it safe and viewable within the application.

Currently using a combination of Nokogiri, Redcloth, force_encodings and other tricks to process them into a useable format.

Suggestions please.

Paul Callaghan

unread,
May 3, 2013, 6:22:09 AM5/3/13
to rubyno...@googlegroups.com
Aren't there standard ruby libraries based on the RFCs? I know Perl
has quite a few. Then use an html sanitizer.
> --
> You received this message because you are subscribed to the Google Groups
> "rubynortheast" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to rubynortheas...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Luke Brown

unread,
May 3, 2013, 6:48:27 AM5/3/13
to rubyno...@googlegroups.com
Hi Lee, we have done quite a bit of work on this at CustomersSure, unfortunately it’s baked into our app at the minute. We hope to extract it in the future, but in the mean time we could meet and compare code, techniques etc? Thoughtbot have released a gem http://robots.thoughtbot.com/post/42286882447/handle-incoming-email-with-griddler which they extracted from their app. Although it has some nice patterns (that I might borrow/steal for a refactor), it doesn't do much of what we need, namely extracting the original sender of a forwarded email, the message content from forwards, reply sections, inline replies etc. 

The business logic is the easy part, but slicing out what you need first is quite a challenge, we are honing in on a decent solution which is working most of the time.

In terms of making it safe, https://github.com/rgrove/sanitize helps. We also convert html to plain text when possible, similar to:
--
Reply all
Reply to author
Forward
0 new messages