Codec MS-SQL, unicode Latin1 UTF8

518 views
Skip to first unread message

J-Michel Angers

unread,
May 18, 2017, 1:06:29 PM5/18/17
to web2py-users
Hello,
For my app, with data in SQL-Server 2012, I cannot have all letter correctly written.

First, in SQL-Server, there are 2 types for storing a string :
 * char and varchar are string types with "old" charset, ie not uniccode, and dependant on the site.
 * nchar, nvarchar are string types with unicode. We can store text from our country, but also stranger text.
For me, in France, the database is declared "Collation = French_Cl_AS", this is équivalent to "Latin1".
For my app, I use all types of data :
a "code" is allways varchar(12), for example Supplier-code, Article-code, Supplier-address, accounting-acct, name of persons, ...
but a comment field can countain many thing.

I think I've tried all combinations to convert my app. Each combination works for a part, but doen't work in another part.
For both case, I have :
a => I already have data in my database, that are put from MS-Access, or directly from "SQL Server Management studio" (parameters list, to use with drop-down fields)
        In my database, I have data like "Frais Généraux", "Certificat de confirmité", .... The app has not to change this values
        I also have data that are a request that joins data from my app database, and my ERP-database (list of suppliers with address like "Saint Nom la Bretèche".
b => I try to type a sentence in my SQLFORM, and this data is written in the database.
       The sentence is ==> l ' élève est dans la forêt, où ça ?
I write it in 2 fields, one in varchar, the other in nvarchar.

First, I set in the DAL "db_codec='Latin1' "
1a - All that data are well displayed in my app.
1b - For both fields, when I display the form, the systems shows ==> L'élève est dans la forêt, où ça ?
       And into the database, when I display it ==> L'élève est dans la forêt, où ça ?
       And if I request data from another tool (MS-Excel, that connects to the same database with an MS-Query), I also get L'élève est dans la forêt, où ça ?
        If I "valid" the form twice (the displayed data are unchanged), I get L'élève est dans la forêt, où ça ?

Second, if I set the DAL with UTF-8 (so without any codec):
2a - The existing data (typed in Sqlserver Management Studio, or a request from different data including ERP) are not correct
      ==> Frais g�n�raux
2b - For my favorite sentence (l'élève est dans la forêt, où ça ?), this is correctly shown when I display it again in the SQLFORM.
  But when I extract data with a third-party tool, or look at the data with MS-Man-Studio, I get :
      ==> L'élève est dans la forêt, où ça ?

The codec "Latin1" seems to be best :
OK => I can display data from other tables (that comes from other tools, like ERP, or MS-Access apps)
OK => When a field already contains a sentence, it's correctl displayed (codec can very well DE-code the Latin1)
Not OK : when the use type a sentence in an SQLFORM, this sentence is bad EN-coded to Latin1.
Curiosly, the result is exactly the same with varchar fields and nvarchar fields. I'm surprised of this.
I've read a lot af things, and many post about codec and ODBC-link with MS-SQL.. I don't find the solution for my case.

Skiros

unread,
May 18, 2017, 7:54:12 PM5/18/17
to web2py-users
Hello J-Michel Angers

I had have that problem and I dont remember how i fixed... jajaja

what type of fields you are using ? nchar o char ? this determines what kind of connection need to use. There are a few options; mssql, mssql2, mssql3 and mssql4. If you are using nchar you need mssql2.

In my esperience, you can use the others options and can still read data in a proper way but are more dificult because you need to decode and encode things all the time and take care of anything are talking in the same encodings...

The encoding of you python and html files are important too, dont forget the encoding sentence at begining of your python code.

I minimize problems with this configurations:
# -*- coding: utf-8 -*- in the python files.
UTF8 without BOM as enconding format in of any files in the proyect.

Jose

unread,
May 18, 2017, 9:07:27 PM5/18/17
to web2py-users
Hi J-Michel Angers

Given:
pydal <-> pyodbc  <-> UnixODBC  <-> FreeTDS  <-> MSSQL

In freetds.conf I have:

client charset = UTF-8

and in pydal I use mssql2 adapter, db_codec is not necessary

Jose

J-Michel Angers

unread,
May 19, 2017, 4:23:03 AM5/19/17
to web2py-users
Hello,
Many thanks at both of you :-)
I've just made a lot of tests, a lot of combinations....

I confirm : mssql4 is NOT the good connector for this. mssql2 works much better ! !
For unicode fields (type nvarchar in SQL-Server), mssql2 without codec works very well, but doen't work for varchar fields (not unicode in SQL-Server).
For traditionnal fields (type = varchar), mssql2 AND db_codec=latin1  make the fields working perfectly !

mssql3 also doesn't work.

So I can now continue working on my workflow, it's very plaisant to see beautiful letters, like I used to :-)
So my definitiv choice : mssql2 AND db_codec='latin1'.

Thank you very much, and long life to web2py.

Tomasz Wrona

unread,
May 23, 2017, 8:30:08 AM5/23/17
to web...@googlegroups.com
HI,

I am using mssql4 adapter.

For table:

CREATE TABLE myTable (
Column_a varchar (10) COLLATE French_CI_AS NOT NULL,
Column_b varchar (10) COLLATE Latin1_General_CI_AS NOT NULL,
Column_c varchar (10) COLLATE Estonian_CS_AS NOT NULL)

I create a view:

CREATE VIEW v_myTable
AS
SELECT
CONVERT (nvarchar (10), Column_a) AS Column_a,
CONVERT (nvarchar (10), Column_b) AS Column_b,
CONVERT (nvarchar (10), Column_c) AS Column_c
FROM myTable

I connect the adapter to the view
Now everything works well

tim.n...@conted.ox.ac.uk

unread,
May 25, 2017, 11:53:29 AM5/25/17
to web2py-users
I use this same stack, and ran into encoding issues when updating the database, because of FreeTDS v.91.

But upgrading to FreeTDS v1.00.40 fixed all unicode issues:
https://github.com/mkleehammer/pyodbc/issues/244

Connect with mssql4n, TDS_Version 7.1+, no need to set db_codec
Reply all
Reply to author
Forward
0 new messages