too many SQL variables in "in_"

1,714 views
Skip to first unread message

mando

unread,
Sep 5, 2014, 2:30:37 AM9/5/14
to sqlal...@googlegroups.com
Hi to all,

I wrote a method like this to reuse the code for many tables at the same time[0]

But, with more than 1000 records sqlite doesn't accepts the amount of id inside .in_(id_list)

How can I filter, split or can manage it?

Thanks a lot and best regards,
Luca


[0]
def query_sort(self,id_list, op, to, tc, idn):
self.order_params = op #sorting parameters
self.type_order = to #asc or desc
self.table_class = tc #the name of the mapper class
self.id_name = idn #the name of the id
filter_params = self.type_order + "(" + self.table_class + "." + self.order_params[0] + ")"
for i in self.order_params[1:]:
filter_temp = self.type_order + "(" + self.table_class + "." + i + ")"

filter_params += ", "+ filter_temp

Session = sessionmaker(bind=self.engine, autoflush=True, autocommit=True)
session = Session()

cmd_str = "session.query(" + self.table_class + ").filter(" + self.table_class + "." + self.id_name + ".in_(id_list)).order_by(" + filter_params + ").all()"

return eval(cmd_str)

Michael Bayer

unread,
Sep 5, 2014, 12:23:31 PM9/5/14
to sqlal...@googlegroups.com
you batch out the values to be used in the IN, then one of two choices:  my preference is to run separate SELECT statements, using IN with each batch.  If you really can’t do that, you can combine the batches of IN groups with an OR:   “x IN (batch1) OR x IN (batch2) …”

keep in mind when you send enormous strings to your database, that places a burden on the query system.  Oracle (which I assume you’re using) also caches these queries.




--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

pyArchInit ArcheoImagineers

unread,
Sep 9, 2014, 4:45:16 PM9/9/14
to sqlal...@googlegroups.com
Hi Michael and thanks a lot.


Il giorno venerdì 5 settembre 2014 18:23:31 UTC+2, Michael Bayer ha scritto:
you batch out the values to be used in the IN, then one of two choices:  my preference is to run separate SELECT statements, using IN with each batch.  If you really can’t do that, you can combine the batches of IN groups with an OR:   “x IN (batch1) OR x IN (batch2) …”




O spent 2 day to find a solutions:
I make write to python this string: 

cmd_str = "session.query(MAPPCLASS).filter(or_(MAPPCLASS.id_invmat.in_([1, 2, 3, 4])).(MAPPCLASS.id_invmat.in_([5, 6, 7, 8]))).order_by(asc(MAPPCLASS.id_invmat)).all()"

But I receive this error..

Neither 'BinaryExpression' object nor 'Comparator' object has an attribute 'or_'

mmmm .... a little suggestion to the right sintax to pass to the eval?
 

keep in mind when you send enormous strings to your database, that places a burden on the query system.  Oracle (which I assume you’re using) also caches these queries


I'm using sqlite DB.

Best regards and thanks a lot.
Luca

Michael Bayer

unread,
Sep 9, 2014, 4:58:59 PM9/9/14
to sqlal...@googlegroups.com
On Sep 9, 2014, at 4:45 PM, pyArchInit ArcheoImagineers <pyarc...@gmail.com> wrote:

Hi Michael and thanks a lot.


Il giorno venerdì 5 settembre 2014 18:23:31 UTC+2, Michael Bayer ha scritto:
you batch out the values to be used in the IN, then one of two choices:  my preference is to run separate SELECT statements, using IN with each batch.  If you really can’t do that, you can combine the batches of IN groups with an OR:   “x IN (batch1) OR x IN (batch2) …”




O spent 2 day to find a solutions:
I make write to python this string: 

cmd_str = "session.query(MAPPCLASS).filter(or_(MAPPCLASS.id_invmat.in_([1, 2, 3, 4])).(MAPPCLASS.id_invmat.in_([5, 6, 7, 8]))).order_by(asc(MAPPCLASS.id_invmat)).all()"

But I receive this error..

Neither 'BinaryExpression' object nor 'Comparator' object has an attribute ‘or_'

there’s be somewhere you’re calling <something>.or_(), which is incorrect, or_() is standalone.   

not sure what “cmd_str” is about, you can just do this directly:

args = [<lots of args>]
or_args = []
while args:
chunk = args[0:1000]
  or_args.append(MAPCLASS.id.in_(chunk))

session.query(MAPCLASS).filter(or_(*or_args))

pyArchInit ArcheoImagineers

unread,
Sep 9, 2014, 5:27:18 PM9/9/14
to sqlal...@googlegroups.com
Hi Michael,
I use  my method for every tables of my DB so I pass the mapper class, the name of id, etc, etc. and through a string I build the cmd to pass to the eval function.

I'll try your method.. Thanks a lot.
Best regards
Luca

pyArchInit ArcheoImagineers

unread,
Sep 11, 2014, 1:53:47 AM9/11/14
to sqlal...@googlegroups.com
Hi,
if I create many "or" request with a little number of id (list populated with 10 values), the script return to me this message: Expression tree is too large (maximum depth 1000)

So, It's possibile there is a limit for sqlite? This is a big problem for using sqlalchemy/sqlite.

I cannot linking many selection because I want to order the query, so I suppose I must to do a single select query. It's correct?

Best regards and thanks a lot
Luca

Michael Bayer

unread,
Sep 11, 2014, 9:08:44 AM9/11/14
to sqlal...@googlegroups.com
I’m assuming pysqlite is raising that. which would be the end of the line for that approach, you need to break it up into individual SELECT statements or write your data to a temp table and JOIN to that.


Jonathan Vanasco

unread,
Sep 11, 2014, 12:39:24 PM9/11/14
to sqlal...@googlegroups.com
i once thought about extending SqlAlchemy to handle this issue behind the scenes, but each database treats `IN()` differently.  for example: oracle maxes out at a number of elements, while mysql maxes out based on the size of the overall statement (which is configured on the server).  it's too much work to limit this in sqlalchemy, as these limits change across servers.  [ i forget what postgres maxed out on, i think it was a hard number too.]

the workaround I used was to just build a query-base, and then run multiple selects with a single `IN` within a for-loop which appends to a list.  i found that performance to be much better than chaining multiple `IN()` with `OR`

pyArchInit ArcheoImagineers

unread,
Sep 13, 2014, 5:25:39 AM9/13/14
to sqlal...@googlegroups.com
Il giorno giovedì 11 settembre 2014 18:39:24 UTC+2, Jonathan Vanasco ha scritto:
i once thought about extending SqlAlchemy to handle this issue behind the scenes, but each database treats `IN()` differently.  for example: oracle maxes out at a number of elements, while mysql maxes out based on the size of the overall statement (which is configured on the server).  it's too much work to limit this in sqlalchemy, as these limits change across servers.  [ i forget what postgres maxed out on, i think it was a hard number too.]

the workaround I used was to just build a query-base, and then run multiple selects with a single `IN` within a for-loop which appends to a list.  i found that performance to be much better than chaining multiple `IN()` with `OR`
 
My problem is to find a dataset of more than 999 records and sort all through ORDER BY statement. How can I use multiple selects and order all records?

I try to explain my problem:

I have a gui where I display all the records on my table. Then I make a search and I find 1500 records. Then I want to order the current dataset (1500 records) by some parameter, so I pass to my function the list of 1500 records ID which will be found the records through the IN() query with the ORDER BY statement. 

Now I don't understand how I can find more than 999 records in sqlite and sort the records through ORDER BY. What strategy I can use? Suggestions?

Thanks a lot!
Luca

Wichert Akkerman

unread,
Sep 13, 2014, 6:24:56 AM9/13/14
to sqlal...@googlegroups.com
On 13 Sep 2014, at 11:25, pyArchInit ArcheoImagineers <pyarc...@gmail.com> wrote:

Il giorno giovedì 11 settembre 2014 18:39:24 UTC+2, Jonathan Vanasco ha scritto:
i once thought about extending SqlAlchemy to handle this issue behind the scenes, but each database treats `IN()` differently.  for example: oracle maxes out at a number of elements, while mysql maxes out based on the size of the overall statement (which is configured on the server).  it's too much work to limit this in sqlalchemy, as these limits change across servers.  [ i forget what postgres maxed out on, i think it was a hard number too.]

the workaround I used was to just build a query-base, and then run multiple selects with a single `IN` within a for-loop which appends to a list.  i found that performance to be much better than chaining multiple `IN()` with `OR`
 
My problem is to find a dataset of more than 999 records and sort all through ORDER BY statement. How can I use multiple selects and order all records?

You can try a different approach, such as creating a (temporary table) which you fill with all your ids, and then do a SQL statement with something like WHERE id IN (SELECT id FROM temp_table); That bypasses any limits in the IN operator.

Wichert.

pyArchInit ArcheoImagineers

unread,
Sep 13, 2014, 6:40:54 AM9/13/14
to sqlal...@googlegroups.com
I had a mystical appearence!!!

I will put in a dictionary the parameters of searching used in the GUI.

If I would to sort the record, the script will re-do the search, which at this time will use only a few parametrs, and I'll put an ORDER_BY in the query.

The script will write dinamically as a string the comand in sqlalchemy sintax, that thank to eval() comand will be run.

If I have all the records of the table present on my GUI, I will only perform a query as id > 0 with ORDER BY statement.

I will try and I will inform you if I have success o complete defeat. :)

Thanks a lot to put me on the right path.

Luca
Reply all
Reply to author
Forward
0 new messages