How to SELECT computed data using SQLAlchemy?

573 views
Skip to first unread message

Daniel Grindrod

unread,
Dec 14, 2021, 5:40:43 AM12/14/21
to sqlalchemy
Hi all,

I'm working on a REST API which is built using Flask-SQLAlchemy and Connexion. I'm fairly new to SQLAlchemy, but it's been brilliant so far :) This API uses SQLAlchemy 1.3.16, and connects to an Oracle Database (12c 12.1.0.1.0 64bit). 

I'm having an issue generating the correct SQL from a SQLAlchemy query. I would really appreciate any help. The troublesome function is shown below.

def similar_matches(input_descriptor, threshold, lim=None, offset): 

  query = db.select([tbl, text(":q_descriptors as query_descriptors, comparison(descriptors, :q_descriptors) as similarity")bindparams(q_descriptor=input_descriptor).\
  where( text("comparison(descriptors, :q_descriptors) >=  q_threshold").bindparams(q_descriptor=input_descriptor, q_threshold = threshold) 

  res = query.execute().fetchall() 

  if len(res)=0 return '', 204 

  return MatchLimaScheme(many = True).dump(res)

This SQLAlchemy code takes two inputs (descriptor and threshold), and searches through each descriptor in an Oracle database, calculating a similarity measure  between the queried descriptor and each stored descriptor. All rows where similarity score >= threshold are returned in a JSON.

The above code works fine, but returns all results - whereas I want to also be able to include a .offset() and a .limit() (for lazy loading). The code above generates SQL along these lines:

SELECT ID, last_modified, descriptors, :q_descriptors as query_descriptors, comparison(descriptors, :q_descriptors) as similarity' FROM tbl WHERE compare(descriptors, :q_descriptors) >= :q_threshold

which works well. However, when I add .limit() or .offset() on the end of my query i.e.

query = db.select([tbl, text(":q_descriptors as query_descriptors, comparison(descriptors, :q_descriptors) as similarity" ).bindparams(q_descriptor=input_descriptor).where( text("comparison(descriptors, :q_descriptors) >= :q_threshold") .bindparams(q_descriptor=input_descriptor,q_threshold = threshold).limit(limit)

The SQL generated changes to be along these lines: 

SELECT ID, last_modified, descriptors FROM (SELECT tbl.ID as ID, tbl.last_modified as last_modified, tbl.descriptors as descriptors, :q_descriptors as query_descriptors, comparison(descriptors, :q_descriptors) as similarity) FROM tbl WHERE compare(descriptors, :q_descriptors) >= :q_threshold WHERE ROWNUM <= :q_limit

As a raw SQL query this is fine, but I'm no longer including the query_descriptors and similarity metrics in my SELECT clause. Thus I get a columnNotFoundError. How do I adjust the .select() function above so that my SQL looks more like:

SELECT ID, last_modified, descriptors, query_descriptors, similarity FROM (SELECT tbl.ID as ID, tbl.last_modified as last_modified, tbl.descriptors as descriptors,:q_descriptors as query_descriptors comparison(descriptors, :q_descriptors) as similarity' FROM tbl WHERE compare(descriptors, :q_descriptors) >= :q_threshold WHERE ROWNUM <= :q_limit OFFSET :q_offset

Basically, I'm looking to explicitly tell SQLAlchemy that I want to SELECT tbl AND query_descriptors AND similarity.

I've also been informed that it's bad practice to not include a .order_by() in these queries, but I don't think that is what's causing the issue here. It is on my to do list though.

Please let me know if I need to provide more information.

Thanks for any help,

Dan

Mike Bayer

unread,
Dec 14, 2021, 8:31:12 AM12/14/21
to noreply-spamdigest via sqlalchemy


On Tue, Dec 14, 2021, at 5:40 AM, 'Daniel Grindrod' via sqlalchemy wrote:
Hi all,

I'm working on a REST API which is built using Flask-SQLAlchemy and Connexion. I'm fairly new to SQLAlchemy, but it's been brilliant so far :) This API uses SQLAlchemy 1.3.16, and connects to an Oracle Database (12c 12.1.0.1.0 64bit). 

I'm having an issue generating the correct SQL from a SQLAlchemy query. I would really appreciate any help. The troublesome function is shown below.

def similar_matches(input_descriptor, threshold, lim=None, offset): 

  query = db.select([tbl, text(":q_descriptors as query_descriptors, comparison(descriptors, :q_descriptors) as similarity")bindparams(q_descriptor=input_descriptor).\
  where( text("comparison(descriptors, :q_descriptors) >=  q_threshold").bindparams(q_descriptor=input_descriptor, q_threshold = threshold) 

heya -

it's early here but I almost want to be able to tell a story about that pattern above, which has select(text("cola, colb, colc, ...))  in it.   It's kind of an "anti-unicorn" for me, as I've done many refactorings to the result-processing part of SQLAlchemy's engine and each time I do so, there's some internal handwringing over, "what if someone is SELECTING from a text() that has multiple columns comma-separated in them?", which specifically is a problem because it means we can't positionally link the columns we get back from the cursor to the Python expressions that are in the select() object, and each time it's like, "nah, nobody would do that", or, "nah, nobody *should* do that", but yet, as there's not a straightforward way to detect/warn for that, there's a whole set of code / commentary at https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/engine/cursor.py#L325 which wonders if we'd ever see this.   

and here it is!  :)   the dark unicorn.    So, it's also the source of the issue here, because the Oracle dialect has to restructure the query to simulate limit/offset.   Soooo.... back into the barn w/ the unicorn and what we do here is make sure the select() has enough structure so that SQLAlchemy knows what's going on and here that would look like (note I'm making some syntactical assumptions about the code above which seems to be incomplete ):

from sqlalchemy import literal, func

query = db.select(
    [
        tbl,
        literal(input_descriptor).label("query_descriptors"),
        func.comparison(tbl.c.descriptors, q_descriptors).label("similarity")
    ]).
  where(
    func.comparison(tbl.c.descriptors, q_descriptors) >= threshold

  )

that way your select() will have .selected_columns entries for every column in "tbl" plus columns "query_descriptors" and "similarity", and these will export on outwards to the subquery that the Oracle dialect creates.









  res = query.execute().fetchall() 

  if len(res)=0 return '', 204 

  return MatchLimaScheme(many = True).dump(res)

This SQLAlchemy code takes two inputs (descriptor and threshold), and searches through each descriptor in an Oracle database, calculating a similarity measure  between the queried descriptor and each stored descriptor. All rows where similarity score >= threshold are returned in a JSON.

The above code works fine, but returns all results - whereas I want to also be able to include a .offset() and a .limit() (for lazy loading). The code above generates SQL along these lines:

SELECT ID, last_modified, descriptors, :q_descriptors as query_descriptors, comparison(descriptors, :q_descriptors) as similarity' FROM tbl WHERE compare(descriptors, :q_descriptors) >= :q_threshold

which works well. However, when I add .limit() or .offset() on the end of my query i.e.

query = db.select([tbl, text(":q_descriptors as query_descriptors, comparison(descriptors, :q_descriptors) as similarity" ).bindparams(q_descriptor=input_descriptor).where( text("comparison(descriptors, :q_descriptors) >= :q_threshold") .bindparams(q_descriptor=input_descriptor,q_threshold = threshold).limit(limit)

The SQL generated changes to be along these lines: 

SELECT ID, last_modified, descriptors FROM (SELECT tbl.ID as ID, tbl.last_modified as last_modified, tbl.descriptors as descriptors, :q_descriptors as query_descriptors, comparison(descriptors, :q_descriptors) as similarity) FROM tbl WHERE compare(descriptors, :q_descriptors) >= :q_threshold WHERE ROWNUM <= :q_limit

As a raw SQL query this is fine, but I'm no longer including the query_descriptors and similarity metrics in my SELECT clause. Thus I get a columnNotFoundError. How do I adjust the .select() function above so that my SQL looks more like:

SELECT ID, last_modified, descriptors, query_descriptors, similarity FROM (SELECT tbl.ID as ID, tbl.last_modified as last_modified, tbl.descriptors as descriptors,:q_descriptors as query_descriptors comparison(descriptors, :q_descriptors) as similarity' FROM tbl WHERE compare(descriptors, :q_descriptors) >= :q_threshold WHERE ROWNUM <= :q_limit OFFSET :q_offset

Basically, I'm looking to explicitly tell SQLAlchemy that I want to SELECT tbl AND query_descriptors AND similarity.

I've also been informed that it's bad practice to not include a .order_by() in these queries, but I don't think that is what's causing the issue here. It is on my to do list though.

Please let me know if I need to provide more information.

Thanks for any help,

Dan


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Daniel Grindrod

unread,
Dec 14, 2021, 12:26:19 PM12/14/21
to sqlalchemy
Hi Michael,

Thanks for such a quick reply.
I enjoyed reading it! I actually inherited this API (I swear I'm not just making excuses!) from a colleague who left a few months earlier, so it's very much been a case of 'Figuring it out as I go along'.

Apologies for the incomplete code - despite it not being particularly exciting code, I wanted to double check that I'm allowed to post it publicly. 
So the original (complete) code for this function is as follows:

def similar_structure_matches(smiles, similarity_threshold): 

  struc_sim_query = db.select([structures_tbl, text(":q_smiles as query_smiles, jc_tanimoto(canonical_smiles, :q_smiles) as similarity").bindparams(q_smiles=smiles)]). \      where(text("jc_tanimoto(canonical_smiles, :q_smiles) >= :q_sim"). bindparams(q_smiles=smiles, q_sim=similarity_threshold)) 

  struc_sim_res = struc_sim_query.execute().fetchall()

  if len(struc_sim_res) ==   0: 
    return '', 204 

returnMatchLimaSchema(many=True).dump(struc_sim_res)

The above code is used to generate tanimoto (similarity) scores for the queried structure against each structure in the database( SMILES describe chemical structures).
As I understand it, the jc_tanimoto function comes from the Chemaxon Cartridge which we have installed on our Oracle server (Cartridge API | ChemAxon Docs).

I'm not entirely sure how to call this function, without it being wrapped by text().
As I understand it, the code you sent across would be applying the comparison (now jc_tanimoto) function in the Python; not within Oracle itself (of course, that was impossible for you to predict with the incomplete code I sent across).

Could you please advise on how to correctly structure this query?

Thanks again,
Dan

Mike Bayer

unread,
Dec 14, 2021, 1:16:02 PM12/14/21
to noreply-spamdigest via sqlalchemy


On Tue, Dec 14, 2021, at 12:26 PM, 'Daniel Grindrod' via sqlalchemy wrote:
Hi Michael,

Thanks for such a quick reply.
I enjoyed reading it! I actually inherited this API (I swear I'm not just making excuses!) from a colleague who left a few months earlier, so it's very much been a case of 'Figuring it out as I go along'.

Apologies for the incomplete code - despite it not being particularly exciting code, I wanted to double check that I'm allowed to post it publicly. 
So the original (complete) code for this function is as follows:

def similar_structure_matches(smiles, similarity_threshold): 

  struc_sim_query = db.select([structures_tbl, text(":q_smiles as query_smiles, jc_tanimoto(canonical_smiles, :q_smiles) as similarity").bindparams(q_smiles=smiles)]). \      where(text("jc_tanimoto(canonical_smiles, :q_smiles) >= :q_sim"). bindparams(q_smiles=smiles, q_sim=similarity_threshold)) 

  struc_sim_res = struc_sim_query.execute().fetchall()

  if len(struc_sim_res) ==   0: 
    return '', 204 

returnMatchLimaSchema(many=True).dump(struc_sim_res)
The above code is used to generate tanimoto (similarity) scores for the queried structure against each structure in the database( SMILES describe chemical structures).
As I understand it, the jc_tanimoto function comes from the Chemaxon Cartridge which we have installed on our Oracle server (Cartridge API | ChemAxon Docs).

I'm not entirely sure how to call this function, without it being wrapped by text().
As I understand it, the code you sent across would be applying the comparison (now jc_tanimoto) function in the Python; not within Oracle itself (of course, that was impossible for you to predict with the incomplete code I sent across).

Could you please advise on how to correctly structure this query?

There's a construct in SQLAlchemy called func that renders a SQL -function-like syntax for any arbitrary word, like this:

f>>> from sqlalchemy import func
>>> from sqlalchemy import select
>>> print(select([func.jc_tanimoto('some data').label("my_label")]))
SELECT jc_tanimoto(:jc_tanimoto_1) AS my_label

so as long as there's no unusual SQL syntaxes in play you can use func.<name> to generate SQL for any SQL function with parameters.





Daniel Grindrod

unread,
Dec 14, 2021, 1:57:05 PM12/14/21
to sqlalchemy
That worked brilliantly, thanks so much for your help!
Very much appreciated :)
Reply all
Reply to author
Forward
0 new messages