Yaml rails encoding woes

558 views
Skip to first unread message

Ragav Satish

unread,
Jul 12, 2012, 12:52:23 AM7/12/12
to pdxruby
So I've encountered this funny problem with rails, yaml  & encoding. Anyone have an idea as to what might be happening here?
 
ruby -v  ->  ruby 1.9.3p194
rails  -v  -> Rails 3.2.5

I have this rather simple piece of code: All it does is create a string and dump it out with yaml.

--------- test.rb------------
#coding: utf-8
require 'yaml'
puts "Running yaml #{YAML::VERSION} engine #{YAML::ENGINE.yamler}"
s = "español"  
puts "Encoding #{s.encoding}" 
File.open("test.yml", "w") do |f|
  f << s.to_yaml                 #works
  YAML::dump(s, $stdout)   #works
  puts YAML::dump(s,f)       #problem
end
----------------------------------

The funky bit is that

a) "ruby test.rb" works
b)  but "rails runner test.rb" doesn't  [1/psych/visitors/emitter.rb:20:in `write': "\xC3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)]

So somewhere deep within the bowels of yaml it appears that the original string encoding is lost. What's surprising is that without rails this seems to work just fine so it doesn't seem to be a yaml problem per se.

--Regards
--Ragav
----------------------------------------------------------------------------------------------------------------------------
Durable Learning -  Membean.com    |     twitter       |   facebook      

markus

unread,
Jul 12, 2012, 1:46:13 AM7/12/12
to pdx...@googlegroups.com
Some thoughts (no promises that they'll help):

1. Rails messes with stuff, and has historically messed with
serialization (json, yaml, etc.) in ways that do stuff like this.

2. If you have a way that works reliably (f << x.to_yaml) hold on to
that, but be sure to check that the serialized output is correct (and
can be reloaded).

3. You may want to play with the options to open, specifically
external_encoding, as that seems to be most directly applicable to the
error you're getting, e.g. something like:

- File.open("test.yml", "w") do |f|
+ File.open("test.yml", "w", external_encoding: 'utf-8') do |f|

4. Good luck, and please let us know what you discover.

-- M


Ragav Satish

unread,
Jul 12, 2012, 2:47:15 AM7/12/12
to pdx...@googlegroups.com
Thanks Markus,

I tried File.open('test.rb', 'w:utf-8') when I first encountered this issue but it makes no difference.  I'll keep the list posted on what I find (if anything).

--Ragav


-- M


--
You received this message because you are subscribed to the Google Groups "pdxruby" group.
To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxruby+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pdxruby?hl=en-US.




--

Brad Heller

unread,
Jul 12, 2012, 3:06:48 AM7/12/12
to pdx...@googlegroups.com
Kinda gross (and frankly disappointing), but is using an alternate
yamler a possibility? Doesn't seem to explode under syck for whatever
reason.

I'd loooooooove to hear how others handle this problem. We run in to
encoding issues all the time and our usual solution is "force it!"

Brad Heller
Co-founder, Tech Guy
http://www.revisu.com
@bradhe / (541) 231-1514

Ragav Satish

unread,
Jul 12, 2012, 10:09:43 AM7/12/12
to pdx...@googlegroups.com
Thanks Brad,

Syck is not a good workaround because it doesn't know much about encoding. I'm concerned that going that route will just cause bigger problems down the lane.

Try this simple code and you'll see that syck isn't respecting the UTF-8 directive while psych is. I haven't found much information on what else psych does differently.

#coding: UTF-8
require 'yaml'
s = "español"
p "Target string - #{s}"
YAML::ENGINE.yamler = 'syck'
syck_str = s.to_yaml
p "Using #{YAML::ENGINE.yamler} #{syck_str} [encoded #{syck_str.encoding}]"

YAML::ENGINE.yamler = 'psych'
psych_str = s.to_yaml
p "Using #{YAML::ENGINE.yamler} #{psych_str} [encoded #{psych_str.encoding}]"

p "Loading syck yaml via psych gives #{YAML::load(syck_str)}"
 

"Target string - español"
"Using syck --- \"espa\\xC3\\xB1ol\"\n [encoded ASCII-8BIT]"
"Using psych --- español\n...\n [encoded UTF-8]"
"Loading syck yaml via psych gives español"

--Ragav

Jesse Hallett

unread,
Jul 12, 2012, 1:20:29 PM7/12/12
to pdx...@googlegroups.com
Maybe we could update ZAML to operate in UTF-8?
https://github.com/hallettj/zaml
Reply all
Reply to author
Forward
0 new messages