I'm trying to created some sort of crude mail viewing app using the
mail gem.
One of the issues I'm running into is that the decoded values for
fields are .decoded into bytes of the encoding they arrived in and
then that information is lost, as far as I can tell.
Here's an example of what I mean:
require 'iconv'
require 'rubygems'
require 'mail'
s = Mail::SubjectField.new("From", 'Subject: =?ISO-8859-1?Q?
Re=3A_ol=E1?=')
s.decoded # => "Re: ol\341" (LATIN1 bytes)
Iconv.conv("UTF8", "LATIN1", s.decoded) # => "Re: ol\303\241" (UTF8
bytes)
Mail::Encodings.unquote_and_convert_to(s.value, 'UTF8') # => "Re: ol
\303\241" (UTF8 bytes)
Also as a gist here: http://gist.github.com/258544
So, Manually converting works for this example but, shouldn't it
happen automatically on decode?
For example, address.rb uses decode internally and this means that if
one uses any of the handy accessor methods, information about the
encoding is lost and the accessors are useless.
Opinions, ideas?
Thanks,
Cristi
Get the latest copy of mail by cloning github.
>> require 'lib/mail'
=> true
>> s = Mail::SubjectField.new("From", 'Subject: =?ISO-8859-1?Q?Re=3A_ol=E1?=')
=> #<Mail::SubjectField:0x102184d38 @name="Subject", @tree=nil,
@length=nil, @value="=?ISO-8859-1?Q?Re=3A_ol=E1?=", @element=nil>
?> s.decoded
=> "Re: ol\341"
Mikel
--
http://lindsaar.net/
Rails, RSpec and Life blog....
Thanks for the amazingly fast reply and fix. However, I'm a bit
confused now because the fix fixes something I didn't intend to
report :).
What I wanted to ask was whether .decoded should in fact use
Mail::Encodings.unquote_and_convert_to to avoid losing the encoding on
the text?
Right now, if you use .decoded, there's no way to safely convert the
resulting bytes to UTF8 because their encoding is not known anymore.
Actually, this goes for decoding the body of a message. I have to do
this to get the proper body content as UTF8. Shouldn't this
automagically happen in .decoded?
class Message
def decoded_and_converted_to(encoding='UTF8')
Iconv.conv("UTF8", charset, body.decoded)
end
end
Cristi
Ok, I see your problem
In Ruby 1.9 this is a moot problem because the encoding is embedded in the text.
For 1.8x, maybe I could put another method in that gives you the
encoding the string is in...........
Lemmie think about it :)
Mikel