How bad is using queries with thousands of values for operators IN or ANY?

20 views
Skip to first unread message

Thorsten Schöning

unread,
Aug 31, 2020, 3:45:33 AM8/31/20
to jOOQ User Group
Hi all,

I have lots of queries in which I need to restrict access to rows
using some decimal row-ID and am mostly doing so with using the
operator IN in WHERE-clauses. Additionally I'm mostly embedding the
IDs as ","-seperated list into the query directly, e.g. because I
already hit a limitation of ~32k parameters of the JDBC-driver[1] for
Postgres.

I really thought that in most cases simply sending a large amount of
IDs embedded into the query is better than looping, because it safes
roundtrips to access the DB, the planner of the DB has all pieces of
information it needs to decide best strategies etc. OTOH, with recent
tests and an increased number of IDs of about factor 100, I have
additional load in Tomcat before actually sending the query to the DB
already. I've attached an example query and plan.

Do you know of any obvious problem in jOOQ rendering such queries? Is
there some reasonable size limit for queries in terms of plain textual
size to render or number of values or alike? Is it even likely at all
that jOOQ will become a problem e.g. compared to what the JDBC-driver
does with the query itself already?

I'm just trying to collect some input for where to look at to optimize
things in the future. Thanks!

[1]: https://github.com/pgjdbc/pgjdbc/issues/90

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning E-Mail: Thorsten....@AM-SoFT.de
AM-SoFT IT-Systeme http://www.AM-SoFT.de/

Telefon...........05151- 9468- 55
Fax...............05151- 9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
attachment.zip

Lukas Eder

unread,
Aug 31, 2020, 4:44:22 AM8/31/20
to jOOQ User Group
Hi Thorsten,

Thanks for your message. This is a tricky question. Ideally, you should run a near-production style benchmark comparing the alternatives for 1) your hardware, 2) your RDBMS versions, 3) your data set.

The benchmark technique I usually recommend is this simple approach here, where you can use two pgplsql loops doing the same logical work with different approaches:

When it comes to comparing IN lists with ANY(?) using array binds, I've benchmarked this for Oracle and PostgreSQL specifically, finding that starting with ~64 binds, the = ANY(?) array parameter starts to outperform the IN list, on my hardware, my RDBMS version, and my data set:

jOOQ supports either syntax, so you can write a jOOQ based benchmark as well if you like.

Answering your different points:

On Mon, Aug 31, 2020 at 9:45 AM Thorsten Schöning <tscho...@am-soft.de> wrote:
Additionally I'm mostly embedding the
IDs as ","-seperated list into the query directly, e.g. because I
already hit a limitation of ~32k parameters of the JDBC-driver[1] for
Postgres.

jOOQ should protect you from that limit:

But of course, 32k parameters is mostly a problem for any JDBC driver and/or server, so it's not a bad idea to avoid this situation. In PostgreSQL's particular case, you don't profit from using bind variables in such cases, because PostgreSQL's execution plan cache is very simple, compared to e.g. Oracle's or SQL Server's, so you might as well inline all those values, if you're not going to use any of the other options (see below)
 
Do you know of any obvious problem in jOOQ rendering such queries?

Yes. Regrettably, large IN lists are always a problem with almost all RDBMS. I've seen Oracle production databases squeal because of cursor cache contention issues (something you won't run into in PostgreSQL), because of this.
 
Is there some reasonable size limit for queries in terms of plain textual
size to render or number of values or alike?

You need to find out the "reasonable" thresholds empirically.
 
Is it even likely at all that jOOQ will become a problem e.g. compared to what the JDBC-driver
does with the query itself already?

I don't think there's a problem in jOOQ here. The problems are in the JDBC driver (or the wire protocol, more specifically, the driver itself shouldn't have a problem), and in the server.
 
Note there are a few options that most people overlook in these cases:
  1. Ideally, you shouldn't have large IN lists, but reproduce the query that produced all those IDs in the first place, and semi-join that. E.g. SELECT * FROM t WHERE id IN (SELECT id FROM ...), where the subquery is some previous query
  2. Even better, change the business logic itself. Do you really need the large IN list? In my experience, these lists appear when users are presented with long lists of checkboxes, and the only reason why anyone would get such a long list is because the user selected "check all". In case of which case 1) applies. Perhaps the feature can be avoided entirely?
  3. In some cases, a temporary table containing all these IDs can help (do benchmark). You'd have to batch-insert all the IDs to that table, and then again semi-join it like SELECT * FROM t WHERE id IN (SELECT id FROM temp_table)
  4. Again, the array bind variable SELECT * FROM t WHERE id = ANY(?) is a reasonable choice in PostgreSQL
Cheers,
Lukas
Reply all
Reply to author
Forward
0 new messages