On 2018-01-12 16:33, Mike Bayer wrote:> On Fri, Jan 12, 2018 at 7:14 AM, Stefan Schwarzer
> <
sschw...@sschwarzer.net> wrote:
> In SQLAlchemy 1.1 series and earlier, you can specify
> exclude_setinputsizes=() to have STRING be part of the automatic
> setinputsizes call. In SQLAlchemy 1.2 these features were all removed
> as there was never any reason to pass most datatypes to setinputsizes.
> in 1.2 you can still say
> engine.dialect._include_setinputsizes.add(cx_Oracle.STRING) to re-add
> it, but this should not be necessary (my test works with or without
> it).
>
> Also which version of Python you're using matters, however I get a
> round trip under both pythons.
>
> To round trip it, do this - note I had to expand the size of the
> VARCHAR to fit your string, it was giving me a "data too large" error
> before, so that might be a clue:
> [...]
I tried your example and got the same mismatch between
original and actual value as before. Also your workaround
that I tried for SQLAlchemy 1.2 didn't work for me. I
haven't tried the workaround for SQLAlchemy 1.1.15. As
described below, I (partially) found out why the workaround
for version 1.2 didn't work.
Since your code worked for you, but not for me, I concluded
that probably the setup of your database is different from
"mine" (I don't control the server). I searched for
information on encodings in Oracle and found this document:
https://docs.oracle.com/en/database/oracle/oracle-database/12.2/nlspg/supporting-multilingual-databases-with-unicode.html#GUID-AA09A60E-123E-457C-ACE1-89E4634E492C
So far I had assumed that if I use `NVARCHAR2`, the column
data would be "unicode" and the Oracle driver would do all
the necessary conversions.
However, with `SELECT * FROM V$NLS_PARAMETERS` I got
NLS_CHARACTERSET WE8MSWIN1252
NLS_NCHAR_CHARACTERSET AL16UTF16
What are the values for your database?
As far as I understand the above Oracle document, the first
encoding, `NLS_CHARACTERSET`, is used for the names of SQL
entity names like table and column names, and also for
columns defined with `VARCHAR2`, `CLOB` etc. (character data
types without the `N` prefix). The second encoding,
`NLS_NCHAR_CHARACTERSET` is used for `NVARCHAR2`, `NCLOB`
etc. (character data types with the `N` prefix).
According to the document, Oracle nowadays recommends
`AL32UTF8` for the database encoding and not using
`NVARCHAR2` etc. for columns because `AL32UTF8` for
`VARCHAR2` can generally encode unicode.
I assume that I won't be able to have the above database
encodings changed. In this case, what do you recommend when
using the database with SQLAlchemy? As far as I understand,
if I use `VARCHAR` or `VARCHAR2` (as other tables in the
database so far), I won't be able to store anything that
doesn't belong in CP1252. If I use `NVARCHAR2`, I _should_
be able to use Chinese characters.
Why not? I changed your suggested code for SQLAlchemy 1.2
to include
e.dialect._include_setinputsizes.add(cx_Oracle.STRING)
e.dialect._include_setinputsizes.add(cx_Oracle.NCHAR)
e.dialect._include_setinputsizes.add(cx_Oracle.NCLOB)
but still inserting the Chinese characters failed.
With some debugging, I noticed that in
`engine.default.DefaultExecutionContext.set_input_sizes`
before the call to `cursor.setinputsizes`, the `text` column
has a corresponding `cx_Oracle.STRING` in the `inputsizes`
dictionary. However, the type of `text` (an `NVARCHAR2`
column) should be `cx_Oracle.NCHAR`. This is also what I get
in the cursor description after the `SELECT` invocation
before the `cursor.setinputsizes` in my working pure-cx_Oracle
example in my original posting. If I change `cx_Oracle.STRING`
to `cx_Oracle.NCHAR` in
`DefaultExecutionContext.set_input_sizes` in the debugger,
the Chinese characters appear in the database as expected.
See also
https://github.com/oracle/python-cx_Oracle/issues/119
If you don't have a suspicion why the wrong type for
the `NVARCHAR2` column is used, I could try to do more
debugging and hopefully find out why.
What do you think?
Best regards,
Stefan