I'm using pyodbc against a SQL Server database; some of the columns are nvarchar, others are varchar. I'm having a problem where if I feed all unicode to pyodbc, particularly where I query against a column of type varchar, that the results take a significantly longer time to return than they would if I feed it a string. However, if I feed strings to pyodbc (encoded as utf8), then I lose the characters that normally can be represented in my nvarchar fields, if that makes sense. The documentation for types on the pyodbc site says that when given strings it assumes ASCII and uses SQL_CHAR which I'm assuming is why the data in the utf8 string that cannot be represented in ASCII is lost.
To be more specific, I'm working in Django, using a fork of django-pyodbc that I've been working on:
https://github.com/avidal/django-pyodbc and I'm trying to figure out a way to sanely determine how to encode the data that the library sends to pyodbc. Because of the nature of the ORM there's currently no way to indicate the encoding of a specific field without creating new fields types (which I'll do if I must), so my current goal is to come up with the right combination of encoding to Do The Right Thing.
I'm using FreeTDS 0.91 with client charset set to UTF-8 in freetds.conf. I'm guessing that at some point, parameters that are used to query against a column of type varchar/char/text need to be encoded to the native type (latin1), while using unicode for the rest of the parameters (to support multiple languages and such). What's the best approach? Should I use `cursor.columns` to determine the column types (and appropriate encoding) and then use that information to do the actual encoding for the non-unicode fields?
The table I'm querying against currently contains 24 million rows; and the column that I query on is varchar(50). Using a string for the query returns results almost immediately, using unicode (which I'm assuming generates a query such as: SELECT * FROM tbl WHERE id = N'<param>') takes upwards of 90 seconds.