>>> engine = sqlalchemy.engine.create_engine("postgresql:///xmsdb", echo=True)
>>> Session = sqlalchemy.orm.sessionmaker(bind=engine)
>>> session=Session()
the query:
>>> q = session.query(Volume.guid, Volume.name, Volume.index)
First, query using the ORM layer:
>>> with profiled():
... r = q.all()
...
2015-10-21 20:24:20,539 INFO sqlalchemy.engine.base.Engine SELECT volumes.guid AS volumes_guid, mom_objects.name AS mom_objects_name, mom_objects.index AS mom_objects_index
FROM mom_objects JOIN volumes ON mom_objects.guid = volumes.guid
2015-10-21 20:24:20,539 INFO sqlalchemy.engine.base.Engine {}
147901 function calls (147890 primitive calls) in 0.396 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.007 0.007 0.396 0.396 /usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:2286(all)
8191 0.055 0.000 0.357 0.000 /usr/lib64/python2.6/site-packages/sqlalchemy/orm/loading.py:26(instances)
24570 0.060 0.000 0.123 0.000 /usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:3468(proc)
1 0.000 0.000 0.093 0.093 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/result.py:778(fetchall)
1 0.000 0.000 0.089 0.089 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/result.py:747(_fetchall_impl)
1 0.028 0.028 0.089 0.089 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects}
8190 0.043 0.000 0.085 0.000 /usr/lib64/python2.6/site-packages/sqlalchemy/util/_collections.py:56(__new__)
24579 0.045 0.000 0.063 0.000 /usr/lib64/python2.6/site-packages/sqlalchemy/sql/annotation.py:85(__hash__)
16380 0.033 0.000 0.061 0.000 /usr/lib64/python2.6/encodings/utf_8.py:15(decode)
1 0.000 0.000 0.032 0.032 /usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:2399(__iter__)
1 0.000 0.000 0.031 0.031 /usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:2413(_execute_and_instances)
1 0.000 0.000 0.031 0.031 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:652(execute)
1 0.000 0.000 0.031 0.031 /usr/lib64/python2.6/site-packages/sqlalchemy/sql/elements.py:316(_execute_on_connection)
1 0.000 0.000 0.031 0.031 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:781(_execute_clauseelement)
1 0.000 0.000 0.030 0.030 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:868(_execute_context)
1 0.000 0.000 0.030 0.030 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py:434(do_execute)
1 0.030 0.030 0.030 0.030 {method 'execute' of 'psycopg2._psycopg.cursor' objects}
16380 0.028 0.000 0.028 0.000 {_codecs.utf_8_decode}
24579 0.018 0.000 0.018 0.000 {hash}
8195 0.016 0.000 0.016 0.000 {method 'update' of 'dict' objects}
8191 0.015 0.000 0.015 0.000 {zip}
The same query executed directly on the engine:
>>> with profiled():
... r = engine.execute(str(q)).fetchall()
...
2015-10-21 20:29:32,336 INFO sqlalchemy.engine.base.Engine SELECT volumes.guid AS volumes_guid, mom_objects.name AS mom_objects_name, mom_objects.index AS mom_objects_index
FROM mom_objects JOIN volumes ON mom_objects.guid = volumes.guid
2015-10-21 20:29:32,337 INFO sqlalchemy.engine.base.Engine {}
33314 function calls (33303 primitive calls) in 0.116 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.092 0.092 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/result.py:778(fetchall)
1 0.000 0.000 0.087 0.087 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/result.py:747(_fetchall_impl)
1 0.028 0.028 0.087 0.087 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects}
16380 0.032 0.000 0.059 0.000 /usr/lib64/python2.6/encodings/utf_8.py:15(decode)
16380 0.027 0.000 0.027 0.000 {_codecs.utf_8_decode}
1 0.000 0.000 0.023 0.023 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:1665(execute)
1 0.000 0.000 0.023 0.023 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:652(execute)
1 0.000 0.000 0.023 0.023 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:846(_execute_text)
1 0.000 0.000 0.023 0.023 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:868(_execute_context)
1 0.000 0.000 0.022 0.022 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py:434(do_execute)
1 0.022 0.022 0.022 0.022 {method 'execute' of 'psycopg2._psycopg.cursor' objects}
1 0.004 0.004 0.004 0.004 /usr/lib64/python2.6/site-packages/sqlalchemy/engine/result.py:762(process_rows)
1 0.000 0.000 0.001 0.001 /usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:2923(__str__)
1 0.000 0.000 0.001 0.001 /usr/lib64/python2.6/site-packages/sqlalchemy/sql/elements.py:476(__str__)
1 0.000 0.000 0.001 0.001 <string>:1(<lambda>)
So, question #1:
Given that in both cases I'm querying columns and essentially just getting rows as a result, why is going through the ORM layer almost 4x slower?
Next, just for sanity's sake I compared the query done on the raw dbapi connection as obtained via sqlalchemy's engine, and on a new psycopg2 connection (which should theoretically be exactly the same):
>>> conn = engine.connect()
>>> raw_conn = conn.connection
>>> curs = raw_conn.cursor()
>>> with profiled():
... curs.execute(str(q))
... r = curs.fetchall()
...
33071 function calls (33060 primitive calls) in 0.113 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.029 0.029 0.091 0.091 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects}
16380 0.034 0.000 0.062 0.000 /usr/lib64/python2.6/encodings/utf_8.py:15(decode)
16380 0.028 0.000 0.028 0.000 {_codecs.utf_8_decode}
1 0.020 0.020 0.020 0.020 {method 'execute' of 'psycopg2._psycopg.cursor' objects}
1 0.000 0.000 0.002 0.002 /usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:2923(__str__)
1 0.000 0.000 0.001 0.001 /usr/lib64/python2.6/site-packages/sqlalchemy/sql/elements.py:476(__str__)
1 0.000 0.000 0.001 0.001 <string>:1(<lambda>)
>>> import psycopg2
>>> conn = psycopg2.connect("dbname=xmsdb")
>>> curs = conn.cursor()
>>> with profiled():
... curs.execute(str(q))
... r = curs.fetchall()
...
311 function calls (300 primitive calls) in 0.029 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.024 0.024 0.024 0.024 {method 'execute' of 'psycopg2._psycopg.cursor' objects}
1 0.003 0.003 0.003 0.003 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects}
1 0.000 0.000 0.001 0.001 /usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:2923(__str__)
1 0.000 0.000 0.001 0.001 /usr/lib64/python2.6/site-packages/sqlalchemy/sql/elements.py:476(__str__)
1 0.000 0.000 0.001 0.001 <string>:1(<lambda>)
These should theoretically be exactly the same, but it looks like the sqlalchemy version is spending some time doing utf-8 decoding, which the psycopg2 connection doesn't. It looks like using the default connection parameters, the psycopg2 connection just returns byte strings while the sqlalchemy connection decodes them too.
I was able to make the call via sqlalchemy match the psycopg2 call time-wise, by modifying the connection parameters:
>>> engine = sqlalchemy.engine.create_engine("postgresql:///xmsdb", echo=True, use_native_unicode=False, client_encoding='utf8')
That caused the query via sqlalchemy to return byte strings as well (and also reduced the query time to ~30ms).
So, question #2, given the performance impact of decoding all strings, what are the possible implications of simply leaving all strings as byte strings? I do want my software to support the full utf-8 character set.
I'm also curious about the apparent difference in the default behavior (to decode vs. not decode). The database itself does hold unicode strings -- for example i changed the name field of one of the items to u'volßßßß'. The response via the two different connections was:
(u'0984e1b061b248b3b4705b198cadc1eb', u'vol\xdf\xdf\xdf\xdf', 487)
vs
('0984e1b061b248b3b4705b198cadc1eb', 'vol\xc3\x9f\xc3\x9f\xc3\x9f\xc3\x9f', 487)
That's a lot to digest... Thanks in advance for the help, and for a great library!
Uri
class Customer(Base):
__tablename__ = "customer"
id = Column(Integer, primary_key=True)
name = Column(Unicode(255))
description = Column(Unicode(255))