Matching against a large number of values

691 views
Skip to first unread message

Steffen Fritzsche

unread,
Feb 18, 2013, 4:55:08 PM2/18/13
to scala...@googlegroups.com
Hi,

I've got the following query:

Values.filter(v => v.change.inSetBind(uuids) && v.artifact === artifact.bind).to[IndexedSeq]

My problem: The "uuids" set might be very large and PostgreSQL driver just accepts 2^16 values max in an IN-construct (according to http://www.postgresql.org/message-id/168327...@talk.nabble.com). Sadly, some of my sets are already too large, and I'm getting the same exception as mentioned in the linked thread:

app: Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 108270
app:    at org.postgresql.core.PGStream.SendInteger2(PGStream.java:201) ~[postgresql-9.1-901-1.jdbc4.jar:na]
app:    at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1237) ~[postgresql-9.1-901-1.jdbc4.jar:na]

Any idea how to rewrite that query with slick to support matching against larger sets?

Thanks for any inspiration!
Steffen

Rogach

unread,
Feb 19, 2013, 9:58:51 AM2/19/13
to scala...@googlegroups.com
Wouldn't it be better (even wrt performance) to store those values in some sort of temporary table?

Steffen Fritzsche

unread,
Feb 19, 2013, 10:48:04 AM2/19/13
to scala...@googlegroups.com
Those queries are part of a longer analysis process. They are executed only once and each element in the list will occur only once in a query. In this special case creating a temporary table is just an additional step.

My current hack for this problem is to group into n-chunks of 2^16 elements, execute n-queries and just aggregate the result in a vector builder. It does work, it's fast enough (~200ms for 110017 values, overall analysis is running 3h) but it does not feel right :)

Cheers, Steffen



Am 19.02.2013 um 15:58 schrieb Rogach <platon...@gmail.com>:

> Wouldn't it be better (even wrt performance) to store those values in some sort of temporary table?
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "Slick / ScalaQuery" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scalaquery+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Rogach

unread,
Feb 19, 2013, 10:53:30 AM2/19/13
to scala...@googlegroups.com
Again, I'm not sure, but you definitely should try to benchmark the route with a temporary table. How many values do you have in that 'Values' table?

Steffen Fritzsche

unread,
Feb 19, 2013, 10:59:24 AM2/19/13
to scala...@googlegroups.com
I've analyzed three projects so far, the value table contains nearly 8 million entries. The number of entries relating to the project causing the error is 4812304. I will give the temp table a shoot …

Steffen
Reply all
Reply to author
Forward
0 new messages