I'm dealing with a bunch of code with hand-rolled SQL query strings,
and as the first step was going to use connections from SqlAlchemy's
connection pool, as I change the queries one-by-one. The only problem
is that our main queries get 500 Number columns and SqlAlchemy is
converting them to Decimals, slowing the queries down by a factor of
10.
Here's an example
>>> engine = sqlalchemy.create_engine("oracle+cx_oracle://"+connString)
>>> conn = engine.pool.connect()
>>> cursor = conn.cursor()
>>> cursor.execute("""SELECT * FROM MY_TABLE""")
>>> r = cursor.fetchone()
>>> r[-1]
Decimal('0.878935370620606')
The conversion is done via the outputtypehandler of the connection
object. I can circumvent it by either setting
>>> cursor.connection.outputtypehandler = None
or
>>> cursor.outputtypehandler = lambda *args : None
but is there a better, more standard way to convert the Numbers to
floats and not Decimals for arbitrary queries?
(Setting engine.dialect.supports_native_decimal to False doesn't work.
I believe the dialect is creating the output type handler before any
connection is opened.)
--
--Anthony
The output handler is there because otherwise cx_oracle immediately converts to Python float, which is then a lossy format. So we need to turn that off on cx_oracle's side immediately. There's a lot of discussion of this at http://docs.sqlalchemy.org/en/latest/dialects/oracle.html#precision-numerics .
My first suggestion would be to just use cdecimal. That way performance would not be an issue - I patch it in at program start time using the example at http://docs.sqlalchemy.org/en/latest/core/types.html#sqlalchemy.types.Numeric . I've been using this approach in production financial applications for several years without issue.
If you use the Float() type, the Decimal will be coerced into a regular float(), but you're looking for performance here so that's not the solution.
There's not a public API right now to turn off this handling - it would imply the _OracleNumeric type and other parts of the cx_oracle dialect would need to be further complicated to support two modes of operation, and it was enormously difficult to get precision numeric round trips as it is. A monkeypatch that would force it off would be:
engine = create_engine(...)
engine.dialect._to_decimal = float
another way you could do it would be to use a connection pool event. Using the connect event:
from sqlalchemy import event
@event.listens_for(engine, "connect")
def connect(connection, rec):
connection.outputtypehandler = None
Note that disabling the outputtypehandler will also mess up unicode handling, unless you replace it with another output handler that returns a "unicode" cursor.var().
On Sun, Feb 5, 2012 at 9:03 PM, Michael Bayer <mik...@zzzcomputing.com> wrote:
> My first suggestion would be to just use cdecimal. That way performance would not be an issue - I patch it in at program start time using the example at http://docs.sqlalchemy.org/en/latest/core/types.html#sqlalchemy.types.Numeric . I've been using this approach in production financial applications for several years without issue.
That looks ideal. I'll have to see about adding that to our systems.
> If you use the Float() type, the Decimal will be coerced into a regular float(), but you're looking for performance here so that's not the solution.
Does that mean when I eventually I set up a table with a column of
type Float(asdecimal=False), I'll still be hit by the conversion
penalty as the read data is converted to a Decimal first by the
connection, and then to a Float by whatever handles the columns? It
sounds like yes.
> There's not a public API right now to turn off this handling - it would imply the _OracleNumeric type and other parts of the cx_oracle dialect would need to be further complicated to support two modes of operation, and it was enormously difficult to get precision numeric round trips as it is. A monkeypatch that would force it off would be:
>
> engine = create_engine(...)
> engine.dialect._to_decimal = float
>
> another way you could do it would be to use a connection pool event. Using the connect event:
>
> from sqlalchemy import event
> @event.listens_for(engine, "connect")
> def connect(connection, rec):
> connection.outputtypehandler = None
>
> Note that disabling the outputtypehandler will also mess up unicode handling, unless you replace it with another output handler that returns a "unicode" cursor.var().
I'm not too worried about unicode conversions. All the text in the
database should be ASCII, and I've seen no attempts at unicode
handling anywhere in the code base.
If I do shut off the outputtypehandler on the connection, will that
cause any other problems as I start adding Table objects with the
appropriate Columns? Will the connection outputtypehandler be reset
when it goes back into the pool? Or should I just keep one connection
from returning to the pool solely for these old style queries?
--
--Anthony
>
> If I do shut off the outputtypehandler on the connection, will that
> cause any other problems as I start adding Table objects with the
> appropriate Columns? Will the connection outputtypehandler be reset
> when it goes back into the pool? Or should I just keep one connection
> from returning to the pool solely for these old style queries?
The outputtypehandler should only be set on the Connection by SQLAlchemy using the same event I gave you, the "connect" event, and it should occur before your handler would get to it. So whatever state you set on it should remain. Without the handler, you get back datatypes like str instead of unicode, and float instead of Decimal. The only area that there would be some dependency on the type returned is when a TypeEngine, like String, Unicode, or Numeric are in use. The String type should be fine. The cx_oracle-specific implementation of numeric here, which is sqlalchemy.dialects.oracle.cx_oracle._OracleNumeric, looks like it might assume it's getting a Decimal or string back if that supports_native_decimal flag is turned on, so if you're using that type you might need to flip that flag off, or do an outputtypehandler similar to the one that's there which just returns decimals as strings.
Attached is a patch that allows "coerce_to_decimal" to be passed to create_engine() and will disable the output handling if passed as False. Do you want to give it a try, I can commit it for 0.7.6 if it does the trick.
It passed my very cursory testing. (Select a Number column, and the
resulting type is float.)
It looks roughly 20-30% slower than using cx_Oracle without SqlAlchemy
(or deactivating the outputtypehandler completely). That compares to
about 100% slower for using cdecimal, and 900% for using the stdlib
decimal.
I'll probably still stick with my hack though. We should definitely
head towards using decimal: I don't think the original writers
realized they were losing precision, and most of those number fields
are going to go away in the redesign. Leaving my hack means that the
conversion to decimal will only hit code as I update it.
But if there are issues, it's sure nice to have that switch in the patch.
--
--Anthony