[sqlite-dev] Are aggregate functions not deterministic?

14 views
Skip to first unread message

Bernardino Ramos

unread,
Mar 10, 2019, 12:49:02 AM3/10/19
to sqlit...@mailinglists.sqlite.org
Hi guys!

I saw that the aggregate functions are not marked as contanst/deterministic.

Is there a reason for that?

I guess that a case like this:

SELECT count(*) FROM table

may be deterministic.

I wonder if it is due to some special corner cases, like the sum of floats on different processors... or any other obscure case.

I guess that this flag is used mainly for the validity of a statement execution to cache results from functions, right?

I changed the AGGREGATE, AGGREGATE2 and WAGGREGATE macros to be constant and run the fulltest. It run with 0 errors. But I don't know if the tests cover these special corner cases, if they exist.

If they exist, does it affect all the aggregate functions or just some of them?

A very big thanks in advance!

Bernardo Ramos

Luuk

unread,
Mar 10, 2019, 9:41:57 AM3/10/19
to sqlit...@mailinglists.sqlite.org


On 10-3-2019 06:47, Bernardino Ramos wrote:
Hi guys!

I saw that the aggregate functions are not marked as contanst/deterministic.

Is there a reason for that?

I guess that a case like this:

SELECT count(*) FROM table

may be deterministic.

Is only deterministic as long as you do not INSERT or DELETE records.

For some tables this might be true for a very long time, for others table this can be a very, very, short time......




Richard Damon

unread,
Mar 10, 2019, 10:06:28 AM3/10/19
to sqlit...@mailinglists.sqlite.org
On 3/10/19 9:41 AM, Luuk wrote:
>
>
> On 10-3-2019 06:47, Bernardino Ramos wrote:
>> Hi galue uys!

>>
>> I saw that the aggregate functions are not marked as
>> contanst/deterministic.
>>
>> Is there a reason for that?
>>
>> I guess that a case like this:
>>
>> SELECT count(*) FROM table
>>
>> may be deterministic.
>
> Is only deterministic as long as you do not INSERT or DELETE records.
>
> For some tables this might be true for a very long time, for others
> table this can be a very, very, short time......
>
And SQLite doesn't want to keep track of changes in the table
invalidating the constant for the aggregate function, and the mere fact
that it would have to indicates that the functions are NOT
deterministic, that being a function of just the parameter and not
anything external to it.

You can tell what value abs(-4) is by looking at it. That is a property
of being deterministic. You can't tell the value of count(*) without
knowing information about the database/table.

--
Richard Damon

_______________________________________________
sqlite-dev mailing list
sqlit...@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-dev

Bernardo Ramos

unread,
Mar 10, 2019, 1:51:49 PM3/10/19
to sqlit...@mailinglists.sqlite.org

On 10 Mar 2019, at 11:06, Richard Damon <Ric...@Damon-Family.org> wrote:

On 3/10/19 9:41 AM, Luuk wrote:


On 10-3-2019 06:47, Bernardino Ramos wrote:
Hi galue uys!

I saw that the aggregate functions are not marked as
contanst/deterministic.

Is there a reason for that?

I guess that a case like this:

SELECT count(*) FROM table

may be deterministic.

Is only deterministic as long as you do not INSERT or DELETE records.

For some tables this might be true for a very long time, for others
table this can be a very, very, short time......

And SQLite doesn't want to keep track of changes in the table
invalidating the constant for the aggregate function, and the mere fact
that it would have to indicates that the functions are NOT
deterministic, that being a function of just the parameter and not
anything external to it.

You can tell what value abs(-4) is by looking at it. That is a property
of being deterministic. You can't tell the value of count(*) without
knowing information about the database/table.

-- 
Richard Damon


OK. I was considering deterministic in another way: returning the same value if the arguments and the internal data are the same.

Now I understood that this concept could not be sufficient for the duration of a statement execution (considering a read statement which returns data in many steps, and a write statement that could change the underlying data during that time).


Just wondering: a SQL query that uses an aggregate function twice should not cache the result? It appears that if the function result is different on 2 invocations on the same query the final result would not be as expected.

Suppose a query like this: (indeed better ones, but this is what came to my mind now)

SELECT * FROM table1 WHERE some_field1 IN (SELECT count(*) FROM table2) OR some_field2 IN (SELECT count(*) FROM table2)

If the system does not cache the result of the common sub-query, the observed result may be different than the expected one... or should we change our default expectations and consider that table2 can change during the statement execution affecting just one of these identical sub-queries? (the last one to be executed on the VDBE)

My default expectation is that the engine returns rows in which some_field1 equals some_field2.


Bernardo Ramos

Luuk

unread,
Mar 10, 2019, 2:55:21 PM3/10/19
to sqlit...@mailinglists.sqlite.org
_______________________________________________
sqlite-dev mailing list
sqlit...@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-dev

A quick test shows what sqlite does do (i changed the query, but you already noted that there are better ones ;):

sqlite> drop table test;
sqlite> create table test (i int primary key);
sqlite> insert into test values (1);
sqlite> insert into test values (2);
sqlite> insert into test values (3);
sqlite> insert into test values (4);
sqlite>.eqp on
sqlite> select * from test where i in (select count(*) from test) or i in (select count(*) from test);
QUERY PLAN
`--MULTI-INDEX OR
   |--INDEX 1
   |  |--LIST SUBQUERY 1
   |  |  `--SCAN TABLE test
   |  `--SEARCH TABLE test USING COVERING INDEX sqlite_autoindex_test_1 (i=?)
   `--INDEX 2
      |--LIST SUBQUERY 2
      |  `--SCAN TABLE test
      `--SEARCH TABLE test USING COVERING INDEX sqlite_autoindex_test_1 (i=?)
i
4
sqlite>

It seems to search the table twice? or am i mis-interpreting these results?

Keith Medcalf

unread,
Mar 10, 2019, 3:14:01 PM3/10/19
to sqlit...@mailinglists.sqlite.org

That is a nasty (as in badly composed) query. You are using IN to compare against a scalar value. While this is not prohibited, why on earth would you do that? The following achieves the desired result, is more readable, and the scalar subquery is only computed once (why would you want to compute it more than once since YOU know that it is (and must be) constant?) (that is, it is not a correlated subquery executed for each row):

SELECT * FROM table1 WHERE (SELECT count(*) FROM table2) in (some_field1, some_field2)

(select count(*) from table2) is a constant and is only computed once ... however if you use it twice (as in your original query), then it will be computed twice.

The query planner does not know that they are the same subquery -- merely that they are subquery's returning a scalar constant result.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-dev [mailto:sqlite-de...@mailinglists.sqlite.org]
>On Behalf Of Bernardo Ramos
>Sent: Sunday, 10 March, 2019 11:50
>To: sqlit...@mailinglists.sqlite.org
>Subject: Re: [sqlite-dev] Are aggregate functions not deterministic?
>
>
>
>
> On 10 Mar 2019, at 11:06, Richard Damon <Richard@Damon-

Luuk

unread,
Mar 15, 2019, 1:44:19 PM3/15/19
to sqlit...@mailinglists.sqlite.org

On 10-3-2019 20:13, Keith Medcalf wrote:
> That is a nasty (as in badly composed) query. You are using IN to compare against a scalar value. While this is not prohibited, why on earth would you do that? The following achieves the desired result, is more readable, and the scalar subquery is only computed once (why would you want to compute it more than once since YOU know that it is (and must be) constant?) (that is, it is not a correlated subquery executed for each row):
>
> SELECT * FROM table1 WHERE (SELECT count(*) FROM table2) in (some_field1, some_field2)
>
> (select count(*) from table2) is a constant and is only computed once ... however if you use it twice (as in your original query), then it will be computed twice.
>
> The query planner does not know that they are the same subquery -- merely that they are subquery's returning a scalar constant result.
>
>
Bernardo was already saying this about the example query:

"indeed better ones, but this is what came to my mind now"

Are there people in this list who can write this query even better?


;)

Reply all
Reply to author
Forward
0 new messages