diff --git a/src/backend.jl b/src/backend.jlindex b5f24af..bf4ee11 100644--- a/src/backend.jl+++ b/src/backend.jl@@ -40,7 +40,7 @@ end# Send query to DMBSfunction ODBCQueryExecute(stmt::Ptr{Void}, querystring::AbstractString)- if @FAILED SQLExecDirect(stmt, utf16(querystring))+ if @FAILED SQLExecDirect(stmt, utf8(querystring))ODBCError(SQL_HANDLE_STMT,stmt)error("[ODBC]: SQLExecDirect failed; Return Code: $ret")end
#SQLExecDirect#Description: executes a preparable statement#Status:function SQLExecDirect(stmt::Ptr{Void},query::AbstractString)@windows_only ret = ccall( (:SQLExecDirect, odbc_dm), stdcall,Int16, (Ptr{Void},Ptr{UInt8},Int),stmt,query,sizeof(query))@unix_only ret = ccall( (:SQLExecDirect, odbc_dm),Int16, (Ptr{Void},Ptr{UInt8},Int),stmt,query,sizeof(query))return retend
and SQLCHAR is defined as unsigned char. So this would seem to be a non-wide character string – i.e. ASCII or UTF-8. And indeed, that's what the Microsoft SQL driver seems to be expecting.SQLRETURN SQLExecDirect(SQLHSTMT StatementHandle,SQLCHAR * StatementText,SQLINTEGER TextLength);
The question I have is this: how the heck is this working for other ODBC drivers? How are they getting pointers to UTF-16 data and interpreting it correctly? The correct fix would seem to be to make this always send UTF-8 strings. But when I made a PR that did that, it seemed to break other setups.
NOTE: The mapping of SQLWCHAR type is somewhat complicated and it can
create hidden pitfalls for programmers porting their code from
Windows to Linux. Usually a SQLWCHAR character is a 16-bit unit and
we will not consider the exotic cases when SQLWCHAR is different.
Windows uses UTF-16 and maps SQLWCHAR to 16-bit wchar_t type.
However, many Linux versions such as Ubuntu Linux use UTF-32 as an
internal character set and therefore their 32-bit wchar_t is not
compatible with SQLWCHAR, which is always 16-bit.
SQLCHAR is for encodings with 8-bit code units. It doesn't imply ASCII or UTF-8 (probably one of the more common character sets used with that is actually Microsoft's CP1252, which is often mistakenly described as ANSI Latin-1 - of which it is a superset).
Even when something says it is UTF-8, it frequently is not *really* valid UTF-8, for example, there are two common variations of UTF-8, CESU-8, used by MySQL and others, which encodes any non-BMP code point using the two UTF-16 surrogate pairs, i.e. to 6 bytes instead of the correct 4-byte UTF-8 sequence, and Java's Modified UTF-8, which is the same as CESU-8, plus embedded \0s are encoded in a "long" form (0xc0 0x80)
SQLCHAR is for encodings with 8-bit code units.
Currently, the only Unicode encoding that ODBC supports is UCS-2, which uses a 16-bit integer (fixed length) to represent a character. Unicode allows applications to work in different languages.
Not a model of clarity (ANSI and Unicode are not encodings), but this page seems to be the best resource on this:It seems that there's a parallel "Unicode" API for ODBC drivers that support it. Moreover:Currently, the only Unicode encoding that ODBC supports is UCS-2, which uses a 16-bit integer (fixed length) to represent a character. Unicode allows applications to work in different languages.So using Klingon is off the table. Although the design of UTF-16 is such that sending UTF-16 to an application that expects UCS-2 will probably work reasonably well, as long as it treats it as "just data".
This still doesn't explain why some drivers are accepting UCS-2/UTF-16 when called with the non-Unicode API.
This still doesn't explain why some drivers are accepting UCS-2/UTF-16 when called with the non-Unicode API.When you do so, are you actually calling the functions with the A, or just the macro without either A or W?The macro will compile to either the A or the W form, depending on how your application is built.This is a better page in MSDN: https://msdn.microsoft.com/en-us/library/ms712612(v=vs.85).aspx describing what is going on.
https://msdn.microsoft.com/en-us/library/ms716246%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
suggests that if you call the version without the A or W suffix you get the ANSI version.