SQL Server, OPENJSON and large blobs of JSON as placeholder bind values

172 views
Skip to first unread message

Paul Harrington

unread,
Nov 30, 2020, 10:03:11 AM11/30/20
to sqlalchemy
Hello Community!
I have not posted here for several years as I have been getting along just fine with the excellent SQLAlchemy toolkit and the excellent documentation. However, I am trying to do something a bit fancy since all the SQL Server dataservers I use are running versions that support OPENJSON. A number of our applications have used staging techniques to copy tables of data to the dataserver so that that can be used in JOINs (e.g. send up 10,000 identifiers into the table rather than doing RBAR or batching into IN clauses). This technique is IO intensive and has quite a bit of overhead. An alternative approach is to serialize the data into a big blob of JSON, send it over as a single scalar and then expand it out to a table via a CTE on the dataserver. We have found the performance of this to be excellent from scripting language, presumably because we are avoiding so many API calls. However, I have not found a way to do this cleanly from SQLAlchemy and end up using sql.text a lot. Also, because it does not seem possible to re-use the same placeholder multiple times in a statement, if we want to expand out the JSON multiple times (e.g. if the JSON is nested and has multiple tables in it), we work around that using [sys].[sp_executesql]. Please see snippet below and a complete script (with create_engine redacted) attached.

What I am looking for is some guidance on how to use SQLALchemy to produce CTEs that reference the same variable (@blob_o_json) multiple times without having to pass in the same value multiple times.

thanks in advance!
pjjH



TODO: formulate this as a SQLAlchemy expression (is there an exec() element?)
outer_sql = r'''
EXEC [sys].[sp_executesql] @statement = :sql,
                           @params = N'@bind_value_blob_o_json varchar(max)',
                           @bind_value_blob_o_json = :json_blob
'''

# we prepare the *outer* statement that has a single command in it
statement = sql.text(outer_sql)

# now execute that passing in unicode value of the inner SQL.

logging.basicConfig(format='%(asctime)s %(message)s')
logging.getLogger().setLevel(logging.DEBUG)
nrows = [1000,10000,100000, 200000, 500000]
for n in nrows:
    # we cons up a list of dicts to represent a single 'inline' table with lots of rows
    logging.debug("passing  %d items" % (n))
    blob_o_json =json.dumps([dict(rn=x,foo=x*2,bar=x-1,fruit='banana', flavor='yummy') for x in range(0,n)])
    # Note: we named arguments in this next line correspond to the :sql and :json_blob placeholders in outer_sql
    result =  engine.execute(statement,sql=six.text_type(inner_sql), json_blob=blob_o_json)
    rs = result.fetchall()
    #logging.debug((len(rs), rs[]))
    # drop down to DBAPI to process each result-set. The application should know how many result-sets will be present
    # I think it is likely that the vast majority of queries will just have a single result-set

    # I did not know how to get to anything other than the first result-set ... I thought there 
    # was probably some pyodbc magic to  make it work with SQLAlchemy and -- sure enough! -- there

    #cursor = result.cursor
    #rs1 = cursor.fetchall()
    #cursor.nextset()
    #rs2 = cursor.fetchall()

    #logging.debug((len(rs1), rs1[-4:], rs2[-4:]))
    logging.debug("finished %d" % (n))

logging.debug("done")
sqlalchemy_execute_sql_demo.py

Mike Bayer

unread,
Nov 30, 2020, 1:08:14 PM11/30/20
to noreply-spamdigest via sqlalchemy
hey there -

took a quick look and we don't support anything with variable declarations, multiple statements in a single string, or multiple result sets.  All of that is outside of SQLAlchemy expression languages scope.

Since what you're doing is extremely specific to a certain database, if your approach works what is the problem you are seeking to solve ?
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.


Attachments:
  • sqlalchemy_execute_sql_demo.py

Paul Harrington

unread,
Nov 30, 2020, 1:39:27 PM11/30/20
to sqlalchemy
Hi Mike!
I guess that I am looking for advice on a 'reasonably good way' to do each of the different parts of the solution. What I have at the moment is all hand-rolled SQL and I guess I am looking at some way of moving some of this to sqlalchemy so that the technique is more composable.
  1. Is there any SQL Server dialect support for CROSS APPLY OPENJSON? any tie-ins with json schema? How do you suggest making a CTE given a table-like json schema?
  2. What is the best way to emulate multi-statement batches? Make individual statements in SA and then compile them to text and reassemble them with sql.text? This is the equivalent to what is referred to as 'inner_sql' in my example program.
  3. Is there a SQL Alchemy block for EXEC sproc? how might I rewrite the equivalent of what is assigned to outer_sql in myexample program?
  4. is dropping down to dbapi/pyodbc still the best way to handle multiple result-sets? 
Hope you are doing well and looking forward to getting back up to speed on what sqlalchemy and seeing what has been incorporated since 0.8!

pjjH

Mike Bayer

unread,
Nov 30, 2020, 1:52:03 PM11/30/20
to noreply-spamdigest via sqlalchemy


On Mon, Nov 30, 2020, at 1:39 PM, Paul Harrington wrote:
Hi Mike!
I guess that I am looking for advice on a 'reasonably good way' to do each of the different parts of the solution. What I have at the moment is all hand-rolled SQL and I guess I am looking at some way of moving some of this to sqlalchemy so that the technique is more composable.
  1. Is there any SQL Server dialect support for CROSS APPLY OPENJSON? any tie-ins with json schema? How do you suggest making a CTE given a table-like json schema?

I don't know anything about OPENJSON so I don't have any insight on how that might work.


  1. What is the best way to emulate multi-statement batches?

it depends on what you are using them for.   in your case it looks like you need them to support SQL-side variables which we have no support for, this is imperative programming and SQLAlchemy's expression langauge draws the line at where declarative becomes imperative.

  1. Make individual statements in SA and then compile them to text and reassemble them with sql.text? This is the equivalent to what is referred to as 'inner_sql' in my example program.

yeah I saw that you were doing that, more robustly you'd use a @compiles recipe, see https://docs.sqlalchemy.org/en/13/core/compiler.html .


  1. Is there a SQL Alchemy block for EXEC sproc? how might I rewrite the equivalent of what is assigned to outer_sql in myexample program?

the Python DBAPI only guarantees that EXEC is available via the callproc() interface: https://www.python.org/dev/peps/pep-0249/#callproc  which SQLAlchemy does not include in its facade, as there's not much point.   Our guidelines for using callproc are at https://docs.sqlalchemy.org/en/13/core/connections.html#calling-stored-procedures .   to the degree that your Python driver (pyodbc?) supports EXEC passed into cursor.execute(), you could send it to regular connection.execute() as well.



  1. is dropping down to dbapi/pyodbc still the best way to handle multiple result-sets? 

yes, there was a proposal for this long ago but multple result sets are really not a thing that are commonly used outside of SQL server for very special cases. it's not worth the extra complexity and luckily I was able to not have to deal with it when I just did the 1.4 refactoring of the result object.


Hope you are doing well and looking forward to getting back up to speed on what sqlalchemy and seeing what has been incorporated since 0.8!

oh quite a lot!  just none of what you're working on above :( 





Paul Harrington

unread,
Nov 30, 2020, 2:22:08 PM11/30/20
to sqlalchemy
OPENJSON is awesome! I think you may find it useful as a performance optimization for persisting a session with  thousands of dirty objects from the same class: you could serialize the state as a JSON object and send it over to the server as a single scalar and 'inflate' it back into rowsets at the server side.

Negative as your answers were, they are still very useful as I know what needs to be done.

pjjH


OPENJSON is a table-valued function that parses JSON text and returns objects and properties from the JSON input as rows and columns. In other words, OPENJSON provides a rowset view over a JSON document. You can explicitly specify the columns in the rowset and the JSON property paths used to populate the columns. Since OPENJSON returns a set of rows, you can use OPENJSON in the FROM clause of a Transact-SQL statement just as you can use any other table, view, or table-valued function.  

Mike Bayer

unread,
Nov 30, 2020, 6:19:50 PM11/30/20
to noreply-spamdigest via sqlalchemy
I had a notion of writing a server side ORM likely for PostgreSQL where we'd leverage PostgreSQL's Python scripting platform to do something interesting along those lines.

sorry my answers were so negative, those are just all very SQL-Server-esque patterns which are fine but don't have much applicability elsewhere!

Paul Harrington

unread,
Nov 30, 2020, 7:42:06 PM11/30/20
to sqlalchemy
No worries! I will have a shot at some of them myself. I think it will be much easier than my first Python project in 2008 which was trying to write a backend for Sybase for SQLAlchemy!
Reply all
Reply to author
Forward
0 new messages