[sqlite-dev] 3ea175512444b0d166af9d674757d83c946d9ce6

Keith Medcalf

unread,

Oct 26, 2019, 5:47:43 PM10/26/19

to sqlit...@mailinglists.sqlite.org

sqlite> insert into t0 values (1);
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 19 0 00 Start at 19
1 OpenWrite 0 2 0 1 00 root=2 iDb=0; t0
2 OpenWrite 1 3 0 k(2,,) 00 root=3 iDb=0; sqlite_autoindex_t0_1
3 Integer 1 2 0 00 r[2]=1
4 NewRowid 0 1 0 00 r[1]=rowid
5 Integer 1 3 0 00 r[3]=1
6 Affinity 3 1 0 B 00 affinity(r[3])
7 Noop 0 0 0 00 BEGIN: GenCnstCks(0,1,1,0,0)
8 Noop 0 0 0 00 uniqueness check for sqlite_autoindex_t0_1
9 Integer 1 5 0 00 r[5]=1; sqlite_autoindex_t0_1 column 0
10 IntCopy 1 6 0 00 r[6]=r[1]; rowid
11 MakeRecord 5 2 4 00 r[4]=mkrec(r[5..6]); for sqlite_autoindex_t0_1
12 NoConflict 1 14 5 1 00 key=r[5]
13 Halt 2067 2 0 t0.c1 02
14 MakeRecord 2 1 7 00 r[7]=mkrec(r[2])
15 Noop 0 0 0 00 END: GenCnstCks(0)
16 IdxInsert 1 4 5 2 10 key=r[4]
17 Insert 0 7 1 t0 39 intkey=r[1] data=r[7]
18 Halt 0 0 0 00
19 Transaction 0 1 5 0 01 usesStmtJournal=0
20 Goto 0 1 0 00

addr 9 is incorrect.

The index key should be after affinity is applied (as in r[3]). The reindex causes the index to be rebuilt with affinity applied so the second insert works because it is inserting a different storage type, but a subsequent reindex fails when it tries to apply affinity to the indexed column.

This is with the current tip of trunk (713fe86b8c).
--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

_______________________________________________
sqlite-dev mailing list
sqlit...@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-dev

Richard Hipp

unread,

Oct 26, 2019, 8:02:37 PM10/26/19

to sqlit...@mailinglists.sqlite.org

On 10/26/19, Keith Medcalf <kmed...@dessus.com> wrote:
>
> addr 9 is incorrect.
>
> The index key should be after affinity is applied (as in r[3]). The reindex
> causes the index to be rebuilt with affinity applied so the second insert
> works because it is inserting a different storage type, but a subsequent
> reindex fails when it tries to apply affinity to the indexed column.
>

Nice work, Keith. You are spot on. I had already found and fixed the
bug before I saw your email - otherwise you would have saved me some
work.

The fix was to simply remove the (incorrect) code that recomputes the
virtual column while omitting the type conversion at the end.

That incorrect code was added early during the effort to add generated
columns. Later on, I revised the INSERT algorithm so it precomputes
all of the virtual columns (correctly) and so the existing code that
simply copies over the correct value from the precomputed value array
did the right thing. So the bug fix in this case was to delete code.
Always nice when the solution to a problem is to delete code.

I suppose an optimization opportunity is to not precompute virtual
table values that are never actually used. But we can add
optimizations further down the road...
--
D. Richard Hipp
d...@sqlite.org

Keith Medcalf

unread,

Oct 27, 2019, 2:02:27 AM10/27/19

to sqlit...@mailinglists.sqlite.org

On Saturday, 26 October, 2019 18:03, Richard Hipp <d...@sqlite.org> wrote:

>On 10/26/19, Keith Medcalf <kmed...@dessus.com> wrote:

>> addr 9 is incorrect.

>> The index key should be after affinity is applied (as in r[3]). The
>> reindex causes the index to be rebuilt with affinity applied so the
>> second insert works because it is inserting a different storage type,
>> but a subsequent reindex fails when it tries to apply affinity to the
>> indexed column.

> Nice work, Keith. You are spot on. I had already found and fixed the
> bug before I saw your email - otherwise you would have saved me some
> work.

> ...

> ... I revised the INSERT algorithm so it precomputes all of the

> virtual columns (correctly) and so the existing code that simply
> copies over the correct value from the precomputed value array
> did the right thing. So the bug fix in this case was to delete
> code.

> I suppose an optimization opportunity is to not precompute virtual

> table values that are never actually used. But we can add
> optimizations further down the road...

This should likely be a pruning operation similar to how "unused" LEFT JOINs are pruned.

However, there is another optimization opportunity that may have a big impact but I do not know exactly how you would implement it, except that you have done so with virtual columns in insert, probably as a side-effect of them needing sometimes to be stored. I think it is a generic change to the expression parsing and code generation logic. However, when the same expressions (or virtual columns) are calculated in a SELECT, the values are regenerated rather than using the calculated value. Eg:

create table x(w real, x as (w*w), y as (abs(x)), z as (x/y));
insert into x values (1);

which does not do any redundant calculations,

addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------

0 Init 0 17 0 00 Start at 17
1 OpenWrite 0 2 0 1 00 root=2 iDb=0; x
2 Integer 1 2 0 00 r[2]=1
3 NewRowid 0 1 0 00 r[1]=rowid
4 SCopy 2 7 0 00 r[7]=r[2]
5 RealAffinity 7 0 0 00
6 SCopy 2 8 0 00 r[8]=r[2]
7 RealAffinity 8 0 0 00
8 Multiply 8 7 3 00 r[3]=r[8]*r[7]
9 Copy 3 8 0 00 r[8]=r[3]
10 PureFunc0 0 8 4 abs(1) 01
11 Divide 4 3 5 00 r[5]=r[3]/r[4]
12 Noop 0 0 0 00 BEGIN: GenCnstCks(0,1,1,0,0)
13 MakeRecord 2 1 6 E 00 r[6]=mkrec(r[2])
14 Noop 0 0 0 00 END: GenCnstCks(0)
15 Insert 0 6 1 x 39 intkey=r[1] data=r[6]
16 Halt 0 0 0 00
17 Transaction 0 1 7 0 01 usesStmtJournal=0
18 Goto 0 1 0 00

vs the select, which does:

select * from x;

addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------

0 Init 0 37 0 00 Start at 37
1 OpenRead 0 2 0 4 00 root=2 iDb=0; x
2 ColumnsUsed 0 0 0 15 00
3 Explain 3 0 0 SCAN TABLE x (~1048576 rows) 00
4 Noop 0 0 0 00 Begin WHERE-loop0: x
5 Rewind 0 35 0 00
6 Noop 0 0 0 00 Begin WHERE-core
7 Column 0 0 1 00 r[1]=x.w
8 RealAffinity 1 0 0 00
9 Column 0 0 5 00 r[5]=x.w
10 RealAffinity 5 0 0 00
11 Column 0 0 6 00 r[6]=x.w
12 RealAffinity 6 0 0 00
13 Multiply 6 5 2 00 r[2]=r[6]*r[5]
14 Column 0 0 5 00 r[5]=x.w
15 RealAffinity 5 0 0 00
16 Column 0 0 7 00 r[7]=x.w
17 RealAffinity 7 0 0 00
18 Multiply 7 5 6 00 r[6]=r[7]*r[5]
19 PureFunc0 0 6 3 abs(1) 01
20 Column 0 0 7 00 r[7]=x.w
21 RealAffinity 7 0 0 00
22 Column 0 0 5 00 r[5]=x.w
23 RealAffinity 5 0 0 00
24 Multiply 5 7 6 00 r[6]=r[5]*r[7]
25 Column 0 0 8 00 r[8]=x.w
26 RealAffinity 8 0 0 00
27 Column 0 0 9 00 r[9]=x.w
28 RealAffinity 9 0 0 00
29 Multiply 9 8 7 00 r[7]=r[9]*r[8]
30 PureFunc0 0 7 5 abs(1) 01
31 Divide 5 6 4 00 r[4]=r[6]/r[5]
32 ResultRow 1 4 0 00 output=r[1..4]
33 Noop 0 0 0 00 End WHERE-core
34 Next 0 6 0 01
35 Noop 0 0 0 00 End WHERE-loop0: x
36 Halt 0 0 0 00
37 Transaction 0 0 7 0 01 usesStmtJournal=0
38 Goto 0 1 0 00

However, other than being wrapped in a loop (and the different output method) 6 through 31 in the SELECT is effectively the same as 4 through 12 in the INSERT -- but the INSERT calculations are obviously more efficient (even if just because there are fewer operations to arrive at the same result). I presume this is because in the INSERT case you are "pre-allocating" registers to hold the columns (as it were), but in the SELECT the expressions are merely dynamically generated (the storage for the "results" are not pre-allocated).

If this could be optimized to work "generically" the same way as the INSERT column generation by pre-allocating the intermediate registers rather than allocating them dynamically, it should work across all places where expressions are computed in SELECT type statements. Simple example:

create table x(a);
insert into x values (1);
select a, a + 1 as b where b == 2

sqlite> select a, a+1 as b from x where b == 2;
QUERY PLAN
`--SCAN TABLE x (~983040 rows)

addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 19 0 00 Start at 19

1 OpenRead 0 2 0 1 00 root=2 iDb=0; x
2 ColumnsUsed 0 0 0 1 00
3 Explain 3 0 0 SCAN TABLE x (~983040 rows) 00
4 Noop 0 0 0 00 Begin WHERE-loop0: x
5 CursorHint 0 0 0 EQ(ADD(c0,1),2) 00
6 Rewind 0 17 0 00
7 Column 0 0 2 00 r[2]=x.a
8 Add 3 2 1 00 r[1]=r[3]+r[2]
9 Ne 4 16 1 50 if r[1]!=r[4] goto 16
10 Noop 0 0 0 00 Begin WHERE-core
11 Column 0 0 5 00 r[5]=x.a
12 Column 0 0 1 00 r[1]=x.a
13 Add 3 1 6 00 r[6]=r[3]+r[1]
14 ResultRow 5 2 0 00 output=r[5..6]
15 Noop 0 0 0 00 End WHERE-core
16 Next 0 7 0 01
17 Noop 0 0 0 00 End WHERE-loop0: x

18 Halt 0 0 0 00

19 Transaction 0 0 9 0 01 usesStmtJournal=0
20 Integer 1 3 0 00 r[3]=1
21 Integer 2 4 0 00 r[4]=2
22 Goto 0 1 0 00

If the output register for b could be pre-allocated then it would only need to be calculated once (either in the WHERE condition OR in the output, wherever it was first needed) but not in both. This is probably a somewhat major change to the way the select code generator works since it requires pre-allocating registers for expression results and then remembering whether those results have already been calculated (and knowing whether the dependent pre-allocated results have been calculated yet). Since you already did this once (for the insert) you may have some idea how this could be implemented similarly for the general case.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

_______________________________________________

Bernardo Ramos

unread,

Nov 11, 2019, 9:20:04 PM11/11/19

to sqlit...@mailinglists.sqlite.org

Hello!

On vdbemem.c on function sqlite3VdbeMemAggValue there is a variable
named t that appears to have no use there. Apparently inherited from
copying the code from sqlite3VdbeMemFinalize function.