Tables with non-ASCII chars in column names

130 views
Skip to first unread message

KD.Gun...@zwick-edelstahl.de

unread,
May 25, 2010, 7:43:18 AM5/25/10
to Rails SQLServer Adapter
Reopening the thread: "2.2.19 breaks on tables with non-ASCII chars in
column names"
but now with sqlserver-adapter-2.3.5 :

I have to use a legacy SQL Server 2005 database, which uses German
Umlaute in the column names.
Running Ruby 1.9 and Rail 2.3.5 on Windows I have some problems using
these columns:

rs = Tablewithumlaute.first
rs.inspect
=> "#<Tablewithumlaute: Uml\xE4ute: \"\\xF6\\xE4\\xFC\">"

So the column Name Umläute is converted to Uml\xE4ute

rs.attribute_names.each { |attr| puts attr.encoding }
ASCII-8BIT

y rs
ArgumentError: invalid byte sequence in UTF-8

-----------------------------------------
Accessing the column in a controller:

#coding: utf-8
def index
Tablewithumlaute.find(:all, :conditions => { "Umläute = ?", "a" })
end

gives a SQL Server Error: ( roughly translated )
"A part of the SQL-Statement is to deeply nested: SELECT * FROM
Tablewithumlaute WHERE (Uml├Âte = 'a')

I am totally lost where to search. I have found that there are two
version of Ruby-ODBC ( odbc and odbc_utf8 ) but didn't found if the
sqlserver-adapter is using odbc_utf8. And is it UTF-8 only for the
data or also for column names ??

Greetings
Klaus

--
You received this message because you are subscribed to the Google Groups "Rails SQLServer Adapter" group.
To post to this group, send email to rails-sqlse...@googlegroups.com.
To unsubscribe from this group, send email to rails-sqlserver-a...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rails-sqlserver-adapter?hl=en.

Ken Collins

unread,
May 25, 2010, 7:49:14 AM5/25/10
to rails-sqlse...@googlegroups.com
It is recommended that you use the UTF8 version of ODBC. When you build/install sometimes it adds "_utf8" to the end. Just go into vendor or site ruby directory and rename this.

Sent from my iPad

KD.Gun...@zwick-edelstahl.de

unread,
May 25, 2010, 11:48:57 AM5/25/10
to Rails SQLServer Adapter
Hi Ken,

after "gem install ruby-odbc" there are four libraries:
c:\Ruby19\lib\ruby\gems\1.9.1\gems\ruby-odbc-0.99991\ext\odbc.so
c:\Ruby19\lib\ruby\gems\1.9.1\gems\ruby-odbc-0.99991\ext
\utf8\odbc_utf8.so
c:\Ruby19\lib\ruby\gems\1.9.1\gems\ruby-odbc-0.99991\lib\odbc.so
c:\Ruby19\lib\ruby\gems\1.9.1\gems\ruby-odbc-0.99991\lib\odbc_utf8.so

and Christan tells us in his README:
Thus, depending on the -K option of ruby one could use
that code snippet:

...
if $KCODE == "UTF8" then
require 'odbc_utf8'
else
require 'odbc'
fi

But in sqlserver_adapter.rb i can only find a line:
require_library_or_gem 'odbc' unless defined?(ODBC)

So I inserted a require "odbc_utf8" in environment.rb,

now for column names
column_names.each do { |col| puts col.encoding }
it tells me that all column_names are still ASCII-8BIT
but the names are OK on the web page.

but or the data itself:
for columns declared as varchar() it tells me that the encoding is
ASCII-8BIT
and the data is OK on the web page
for columns declared as nvarchar() it tells me that the encoding is
UTF-8
but the data fails to display with error "incompatible character
encodings: ASCII-8BIT and UTF-8"

if I start Ruby with Ruby -Eutf-8 this will reverse the effect
varchar() now fails with the error and nvarchar shows up ok.

( All Tests were running on Windows XP, ruby 1.9.1p378 [i386-mingw32],
Rails 2.3.5, ruby-odbc 0.99991
using Microsoft SQL Server Native Client 9.00.4035.00 against an SQL
Server 2005 on Windows Server 2003 )

As Yehuda says on http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/
" Unfortunately, this simply means that people who do encounter it are
baffled and find it hard to get help. "

Ken Collins

unread,
May 25, 2010, 12:43:23 PM5/25/10
to rails-sqlse...@googlegroups.com

Hey,

after "gem install ruby-odbc" there are four libraries:
c:\Ruby19\lib\ruby\gems\1.9.1\gems\ruby-odbc-0.99991\ext\odbc.so
c:\Ruby19\lib\ruby\gems\1.9.1\gems\ruby-odbc-0.99991\ext
\utf8\odbc_utf8.so
c:\Ruby19\lib\ruby\gems\1.9.1\gems\ruby-odbc-0.99991\lib\odbc.so
c:\Ruby19\lib\ruby\gems\1.9.1\gems\ruby-odbc-0.99991\lib\odbc_utf8.so

and Christan tells us in his README:
   Thus, depending on the -K option of ruby one could use
   that code snippet:

     ...
     if $KCODE == "UTF8" then
       require 'odbc_utf8'
     else
       require 'odbc'
     fi

But in sqlserver_adapter.rb i can only find a line:
 require_library_or_gem 'odbc' unless defined?(ODBC)

So I inserted a require "odbc_utf8" in environment.rb,

That's a reasonable config for someone that wants both versions installed. I think the argument could be made that if "encodeing: utf8" is found in the configuration to do this, but there are a lot of odd corners here. First my assumption is that when people install unixODBC the just pick one or the other and never have 'odbc_utf8' laying around. The adapter is not going to do all the crazy permutations that end user setups have and hence that is why it let's you define ODBC first before requiring. I'm open to ideas/feedback, but even tho this caused you some pain, this just seems normal.

now for column names
 column_names.each do { |col| puts col.encoding }
it tells me that all column_names are still ASCII-8BIT
but the names are OK on the web page.

I think your mixing concerns here or I'm not understanding. The ActiveRecord::Base.column_names is just an array of strings. There is nothing adapter related besides how the name of those strings are populated. What you typed above is equal to:

["col1name", "col2name"].each { |string| string.encoding }

Here is a real "column" object.

=> #<ActiveRecord::ConnectionAdapters::SQLServerColumn:0x105db50f0 @precision=nil, @primary=true, @default=nil, @type=:string, @limit=18, @null=false, @scale=nil, @name="foo_id", @sqlserver_options={:is_identity=>nil, :length=>18, :numeric_precision=>nil, :table_name=>"foos", :numeric_scale=>nil}, @sql_type="nvarchar(18)">

As you can see the sql type is a national/unicode column and this object responds true to #is_utf8?

but or the data itself:
for columns declared as varchar() it tells me that the encoding is
ASCII-8BIT
and the data is OK on the web page
for columns declared as nvarchar() it tells me that the encoding is
UTF-8
but the data fails to display with error "incompatible character
encodings: ASCII-8BIT and UTF-8"

if I start Ruby with Ruby -Eutf-8 this will reverse the effect
varchar() now fails with the error and nvarchar shows up ok.

( All Tests were running on Windows XP, ruby 1.9.1p378 [i386-mingw32],
Rails 2.3.5, ruby-odbc 0.99991
 using Microsoft SQL Server Native Client 9.00.4035.00 against an SQL
Server 2005 on Windows Server 2003 )

I'm still a bit confused.... I can tell you what adapter does. Basically if you have a column that is national/unicode and if your in ruby 1.9.x then it force encode that accordingly so that string is what you would expect. For me on 1.8/1.9 all the tests are passing. You may want to run them and see where/if they are failing and/or show me a failing test so I can understand if this is a adapter issue your talking about or just the normal pain of what Yehuda was stating :)

Lemme know.

 - Ken


Ken Collins

unread,
May 25, 2010, 12:45:16 PM5/25/10
to rails-sqlse...@googlegroups.com

FYI, just in general feedback on the subject, I'm not sure there are tests in AR and/or the adapter that covers non-ASCII chars in column names. File a general ticket on github and I'll write a test for you and take a look. Thinking about that more, the force encoding may be needed in column names too.

 - Ken

Ken Collins

unread,
May 25, 2010, 10:58:17 PM5/25/10
to rails-sqlse...@googlegroups.com

I tried making a patch for this that encoded the column name for the SqlServerColumn object and found that various insert/update statements were failing big time too because of cross encoding compatibility issues. I feel something can be done and if you have the time, please do investigate this. Let me know what I can do to help.

 - Ken

KD.Gun...@zwick-edelstahl.de

unread,
May 26, 2010, 1:27:23 PM5/26/10
to Rails SQLServer Adapter
Hi Ken,

I will get some help from a Ruby professional tomorrow and we will try
to setup a test environment.
Should I test against the new 2.3.6 release ??? ( or 2.3.5 or both ? )
Currently I only have a SQL Server 2005 installed and I don't know if
I will have enough time for also installing and testing SQL Server
2000 and 2008, sry for that.
I am just reading the Microsoft documentation on SQL Server and found
that for all SQL Server releases 2000/2005/2008 the table names and
column names are allowed as unicode strings. I will try to tackle this
tomorrow.

Greetings

Klaus

> >> For more options, visit this group athttp://groups.google.com/group/rails-sqlserver-adapter?hl=en.

Ken Collins

unread,
May 26, 2010, 1:53:42 PM5/26/10
to rails-sqlse...@googlegroups.com

You should always check out the master of the git repo when testing. As far as the DB on the backend, that's your choice. Let me worry about doing the cross 2000/2005/2008 stuff. You can focus on the base.

On a side note, I just found out that you can have method names in ruby as unicode strings, which makes sense, but I never tried. I mention it because if that was not the case, then those column names would not come down to real method names for an AR object.

- Ken

Reply all
Reply to author
Forward
0 new messages