Thanks.
Bharat
--
Posted via http://www.ruby-forum.com/.
What do you mean by "refuses to process them"? Are you seeing
mojibake? Or nothing at all?
Some questions come to mind:
- Is the DB connection set to use utf-8, in the case of Postgres? Not
sure how this is set for sqlite, but presume there is a way.
- Is your environment somehow using an encoding besides utf8?
Regards,
Ammar
Please, read this:
http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
- Show us some code, show us the exact problem with a sample program.
- Tell us exact version not just RUby, but your operating system and
SQLite3 you're using.
Help us help you.
--
Luis Lavena
To make a long story short, it was the character encoding problem with
ruby 1.9.2.
The following is a snippet of code from seeds.rb file
courses = [ {:title => 'Principles of Good Cooking 1', :course_code =>
'PGC1',
:lessons => [{:title => 'Getting Started'},
{:title => 'Sautéing',
:topics => [ {:tag => "Lecture", :title => "Introduction to
Sautéing",
:pages => [ {:title => "Video Lecture" }] },
{:tag => "Quiz", :title => "Test Your Sautéing IQ",
:pages => [ {:title => "Questions" }] },
{:tag => "Taste Test", :title => "Cooking With Wine",
:pages => [ {:title => "Introduction"},
{:title => "Instructions"},
{:title => "Taste Wine"},
{:title => "Reduce Wine"},
{:title => "Taste Reduced Wine"},
{:title => "Your Results" },
See that 'Sautéing', string?
That is Sauteing with funny symbols over e for french. That was causing
the
rake db:seed command to fail (throw exception) as follows:
bruparel:~/school
→ rake db:seed
(in /Users/bruparel/school)
rake aborted!
/Users/bruparel/school/db/seeds.rb:3: invalid multibyte char (US-ASCII)
/Users/bruparel/school/db/seeds.rb:3: invalid multibyte char (US-ASCII)
/Users/bruparel/school/db/seeds.rb:3: syntax error, unexpected $end,
expecting '}'
{:title => 'Sautéing',
^
The solution was to put the following line at the top of this file
(seeds.rb)
# encoding: utf-8
Now rake db:seed ran fine and indeed populated the tables. I could see
the correct character encoding in the databases (both SQLite3 and
Postgres) but the display was coming out with plain "Sauteing" instead
of the French rendition of "e", that was because of the following line
in database.yml file.
development:
adapter: sqlite3
pool: 5
timeout: 5000
encoding: utf8 <--- because of this
database: db/atk_school_development
Instead it should be as follows:
development:
adapter: sqlite3
pool: 5
timeout: 5000
encoding: unicode <--- this works
database: db/atk_school_development
If someone can articulate some simple rules for character encoding in
Ruby 1.9.2 p0 and Rails 3.0.1 environment, that will be quite useful.
Ruby interprets each file 'encoding' or magical comments to decide
which encoding is going to use for that particular file.
If the file lacks encoding it assumes the one provided by
Encoding.default_external, which in your case seems US-ASCII.
sqlite3-ruby, since 1.3.0 is quite aware of character encoding and
should work properly.
If Rails is not doing the right thing, that is another question.
You can double check that doing:
ActiveRecord::Base.connection.execute 'PRAGMA encoding'
That can tell you which encoding SQLite3 was open.
Further than that and about Rails specific issues, ask Rails-Talk:
http://groups.google.com/group/rubyonrails-talk
--
Luis Lavena
That answer is wrong - but I don't blame you for giving a wrong answer,
since the whole encoding nonsense in ruby 1.9 is ridiculously
complicated.
The correct answer is: the encoding of a ruby 1.9 source file (and hence
the String literals within that file) is *always* US-ASCII, unless you
tag it with a #encoding line which says otherwise.
I have so far collected about 200 rules for how encodings work in ruby
1.9: https://github.com/candlerb/string19/blob/master/string19.rb
Unfortunately, this list is just the tip of the iceberg. To be complete,
it would have to describe the encoding-related behaviour of every method
on String, every method which accepts a String, and every method which
returns a String.
Regards,
Brian.
That's a great collection of tips and rules. Thanks for sharing.
Cheers,
Ammar
This works for me and is consistent with my observation. Rails does set
a default encoding in one of the files config/application.rb as shown
below:
# configure the defaulting encoding used in templates for Ruby 1.9
config.encoding = "utf-8"
It seems like the seeds.rb file which is conventionally used to
initialize data is unaware of this setting. Further, it seems like that
is not what the Rails team intended.
Regards,
Bharat
As the comment says, that setting is used for templates, but seeds.rb is
ruby source code.
When you read a ruby 1.9 source file using load() or require(), then the
encoding is always forced to US-ASCII unless you tag it with a
#encoding. That is actually a sane default - imagine what would happen
if the same source file were parsed differently depending on what system
it ran on (*).
It gets more complex if instead of using load() or require(), you read
the file into a String and then eval() that String. In that case, the
encoding of the String is used as the source encoding, unless overridden
by a #encoding line.
Regards,
Brian.
(*) However, the same program may still behave differently on different
systems, even if parsed identically. This is because the default is to
allow the environment to decide the encoding of data files. You need to
explicitly override this if you want your program to behave in a sane
fashion, and that's what Rails is doing: whenever it reads a template,
it applies its own config.encoding setting instead of letting Ruby pick
an (essentially arbitrary) encoding.
Thank you Brian for correcting me. Encoding has always been in my TODO
list.
--
Luis Lavena