a database implementation (data structure) for particular queries?

Ivan Shmakov

unread,

Oct 17, 2012, 3:38:53 AM10/17/12

to

I wonder if there's a database implementation (or data
structure, or an index) well suited for queries like the
following:

SELECT * FROM foo
WHERE (a1 IN (a1a, ..., a1P)
AND a2 IN (a2a, ..., a2Q)
...
AND aN IN (aNa, ..., aNZ));

where a1, ..., aN are all but one of the table columns, and
P, Q, ..., Z may vary (and are not necessarily equal to each
other.)

To note is that the amount of INSERT's into the structure is
non-negligible when compared to the number of SELECT's.

Also, I'd be interested in a quick way to compute an estimate
(an upper bound) of the number of rows to be returned from the
query above.

TIA.

--
FSF associate member #7257

Gene Wirchenko

unread,

Oct 17, 2012, 12:40:07 PM10/17/12

to

On Wed, 17 Oct 2012 14:38:53 +0700, Ivan Shmakov <onei...@gmail.com>
wrote:

> I wonder if there's a database implementation (or data
> structure, or an index) well suited for queries like the
> following:

Well-suited in what way? Are you looking for ease-of-use? Speed
of access? What?

> SELECT * FROM foo
> WHERE (a1 IN (a1a, ..., a1P)
> AND a2 IN (a2a, ..., a2Q)
> ...
> AND aN IN (aNa, ..., aNZ));
>
> where a1, ..., aN are all but one of the table columns, and
> P, Q, ..., Z may vary (and are not necessarily equal to each
> other.)

Are you trying to validate the data in the table?

You have not given much detail. What about putting a1a..a1P in a
lookup table and so on for a2*..aN*?

> To note is that the amount of INSERT's into the structure is
> non-negligible when compared to the number of SELECT's.

^^^^^^^^^^^^^^
Nearly meaningless. How many inserts? How many selects?

> Also, I'd be interested in a quick way to compute an estimate
> (an upper bound) of the number of rows to be returned from the
> query above.

Other than the obvious number of rows in the table, I think any
answer would be a guess.

Sincerely,

Gene Wirchenko

com...@hotmail.com

unread,

Oct 19, 2012, 2:41:28 AM10/19/12

to

On Wednesday, October 17, 2012 12:38:55 AM UTC-7, Ivan Shmakov wrote:
> I wonder if there's a database implementation (or data
> structure, or an index) well suited for queries like

> SELECT * FROM foo
> WHERE (a1 IN (a1a, ..., a1P)
> AND a2 IN (a2a, ..., a2Q)
> ...
> AND aN IN (aNa, ..., aNZ));

This is
SELECT * FROM foo
WHERE ((a1=a1a OR ... OR a1=a1P) AND ... AND (aN=aNa OR ... OR aN=aNZ))
or
SELECT foo.*
FROM foo
JOIN (SELECT a1a AS a1 UNION ... UNION SELECT a1P as a1)
...
JOIN (SELECT aNa AS aN UNION ... UNION SELECT aNZ as aN)

If your DBMS is not "well suited" to this then it's not clear what it could be "well suited" for. (Is your context even a DBMS?)

> Also, I'd be interested in a quick way to compute an estimate
> (an upper bound) of the number of rows to be returned

NATURAL JOINing on the same attribute set, the result is no larger than either operand. One operand is a cross product; its size is the product of constituent single-column sizes. So an upper bound is min(#rows(foo), P * ... * Z). (Assuming no duplicates in foo. Otherwise upper bound is #rows(foo).) Each result column can only have values from an operand column. So a stricter upper bound is min(#rows(SELECT a1 FROM foo), P) * ... * min(#rows(SELECT aN FROM foo), Z). But an upper bound is not likely an "estimate" of a typical case.

As Gene wrote, you need to tell us the assumptions behind your questions.

philip