On 03/31/2016 06:40 PM, Robert Smith wrote:
> Mike Bayer:
>
> Thank you for your response. I wasn't really asking whether the
> optimization I described above (using `%` instead of `similarity`) is
> correct or not. Based on some resources (e.g. Super Fuzzy Searching on
> PostgreSQL
> <
http://webcache.googleusercontent.com/search?q=cache:59hB5zYwRzcJ:bartlettpublishing.com/site/bartpub/blog/3/entry/350+&cd=9&hl=en&ct=clnk&gl=mx>,
> FUZZY STRING MATCHING WITH TRIGRAM AND TRIGRAPHS
> <
http://www.postgresonline.com/journal/archives/169-Fuzzy-string-matching-with-Trigram-and-Trigraphs.html>),
> I think it is correct although I wouldn't mind being proven wrong. I was
> mainly asking, why sqlalchemy doesn't seem to take advantage of this
> improvement?
The Core keeps everyone sane by emitting to as much a degree as is
possible exactly the SQL you tell it to. I see a hardcoded number ".5"
in your optimization as well as a significant structural difference,
which I don't think it would be reasonable for Core to take it upon
itself to emit with no intent coming from the user.
More generally, func.XYZ() has no awareness of most functions except a
small handful just for the purposes of helping with return types.
Similarly, there are no attempts to dig into the expressions used within
functions and try to make decisions based on indexes associated with the
elements of those expressions; parsing SQL and applying indexes to
expressions is the job of the database itself and it wouldn't be
reasonable for SQLAlchemy to start attempting to duplicate all that
extremely rich, complex, and idiosyncratic logic that we get for free
from the database. If this optimization is truly applicable by default
I'd ask why doesn't Postgresql itself apply it automatically behind the
scenes; if the database engine is failing to take advantage of its own
optimizations which can be established as appropriate by default without
any additional intent from the user, that's a bug on their end.
>
> Your second comment intrigued me: `though this is not really important
> in the case of automapping as indexes do not affect ORM mappings.`. I'm
> pretty new to sqlalchemy, so I think I don't understand what you mean.
> Can you describe this a bit more?
An ORM mapping links the attributes in an object-oriented class to rows
in a database table. The purpose of this mapping is to generate
SELECT, INSERT, UPDATE, and DELETE statements to be emitted to the
database, and to retrieve result data from these statements in the form
of database rows. None of these operations have any requirement to
explicitly state the names of indexes nor are any indexes required for
any of those operations to function. So awareness of indexes on the
SQLAlchemy side for the purposes of emitting and retrieving data from
DQL or DML is unnecessary. The SQLAlchemy Index() construct is only
useful for DDL (e.g. CREATE INDEX and related statements), which is a
concern separate from object relational mapping.
>
> Regards
>
> On Thursday, March 31, 2016 at 7:43:11 AM UTC-6, Mike Bayer wrote:
>
>
>
> On 03/30/2016 09:50 PM, Robert Smith wrote:
> > I'm using sqlalchemy 1.0.12 with postgreSQL 9.4.6 with the pg_trgm
> > extension enabled. Basically, I'm performing a similarity-based
> query on
> > a single column:
> >
> > In [26]:
> >
> str(session.query(Model).order_by(desc(func.similarity(Model.description,
> 'string'))).limit(100))
> > Out[26]: 'SELECT
model.id <
http://model.id> AS model_id,
> model.description AS
> > model_description, \nFROM model ORDER BY
> similarity(model.description,
> > :similarity_1) DESC\n LIMIT :param_1'
> >
> > This query works okay, but I'm worried that looking at the query
> that is
> > being used, it doesn't seem to take advantage of some performance
> > improvements that are available in this kind of situation. As a
> simple
> > example, I noticed that people recommend not to use `similarity` and
> > instead rely on `%` or alternatively, use `WHERE
> model.description %
> > 'string' AND similarity(model.description, 'string') > 0.5`. This is
> > apparently related to the efficient use of a "gist" index.
>
> this is more of a Postgresql optimization question so I dont have any
> insight on that.
>
> >
> > Another question: In the case of having an index for the trigram
> > matches, does sqlalchemy pick up that information automatically
> when I'm
> > using `automap_base()` to reflect my database?
>
> Table reflection will reflect table indexes, though this is not really
> important in the case of automapping as indexes do not affect ORM
> mappings.
>
>
>