is there a way to write this SQL more succinctly? (Dave Nicol's sparse row database)

1 view
Skip to first unread message

David Nicol

unread,
Jan 2, 2010, 1:14:46 PM1/2/10
to kul...@googlegroups.com
I'm mapping property names to integers for storage efficiency (at high
computational cost, giving
the effort dubious worth) using a "tokens" table,
which contains text token, interger primary key n, and integer count
which gets incremented
to maintain a usage statistic. (although my database might actually
keep that already in its
indexing mechanism but lets pretend I don't suspect that.)

This statement gets called with the name of the token before any SQL
that refers to the
properties table


$dbh->prepare(
"INSERT OR REPLACE INTO tokens (token, n, count) VALUES (
? ,
( select n from tokens where token = ?1 ),
(
CASE
WHEN EXISTS ( select 1 from tokens where token = ?1 )
THEN (select 1+count from tokens where token = ?1)
ELSE 1
END
)
)"

is there a way to reduce the number of subselects?

"INSERT OR REPLACE INTO tokens (token, n, count)
SELECT ? , n, 1+count FROM tokens WHERE token=?1"

won't work because the first time we see a token the subselect won't
return anything.

I suppose the Proper SQL Way here is to decouple the two operations
(token initialization, statistics tracking) into separate, simpler
statements, one INSERT OR IGNORE and one UPDATE for the statistics,
but I like the autovivification semantic I have with my one statement
currently.

Any advice?

Can CASE supposed to return multiple columns? That would cut from
three to two, but the documentation on CASE all seems to indicate that
it returns a scalar.

--
Is it the time when there isn't time to discuss but there is time to act yet?

Larry Sanders

unread,
Jan 2, 2010, 3:57:38 PM1/2/10
to kul...@googlegroups.com
Perhaps COALESCE

> --
>
> You received this message because you are subscribed to the Google Groups
> "kulua-l" group.
> To post to this group, send email to kul...@googlegroups.com.
> To unsubscribe from this group, send email to
> kulua-l+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/kulua-l?hl=en.
>
>
>

--
Sent from my mobile device

Frank Wiles

unread,
Jan 3, 2010, 2:04:43 PM1/3/10
to kul...@googlegroups.com

I'm assuming you're using MySQL based on the INSERT OR REPLACE INTO
syntax, if so then I think you're really wanting this syntax:

http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

I'm big on PostgreSQL myself, so whenever I find myself doing stuff
like this I just write a small stored procedure that handles the logic
for me. Depending on the data itself, I would probably opt to attempt
to update the row first, if it fails do to there not being a row that
matches, I would then do an insert with a DEFAULT of 1 on the counter
column. This assumes you'll have more updates than INSERTs over time,
otherwise you'd flip the logic and attempt the INSERT first.

--
Frank Wiles
Revolution Systems | http://www.revsys.com/
fr...@revsys.com | (800) 647-6298

David Nicol

unread,
Jan 3, 2010, 6:01:34 PM1/3/10
to kul...@googlegroups.com
On Sun, Jan 3, 2010 at 1:04 PM, Frank Wiles <fr...@wiles.org> wrote:

> I'm assuming you're using MySQL based on the INSERT OR REPLACE INTO
> syntax, if so then I think you're really wanting this syntax:
>
> http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
>
> I'm big on PostgreSQL myself, so whenever I find myself doing stuff
> like this I just write a small stored procedure that handles the logic
> for me.  Depending on the data itself, I would probably opt to attempt
> to update the row first, if it fails do to there not being a row that
> matches, I would then do an insert with a DEFAULT of 1 on the counter
> column.  This assumes you'll have more updates than INSERTs over time,
> otherwise you'd flip the logic and attempt the INSERT first.

thanks, ON DUPLICATE KEY UPDATE looks like exactly what this needs,
but the MySQL emulation in SQLite, which I'm prototyping in,
eventually to migrate to a back-end unknown -- perhaps Tokyo cabinet
-- doesn't document that clause.

SQLite allows multiple statements in a TRIGGER, which I suppose
amounts to a stored procedure. The prototype currently uses the
COALESCE function to avoid one of the three subselects in my original
post. So Postgre doesn't support INSERT OR REPLACE, but has stored
procedures to provide equivalent function. An INSERT OR UPDATE is
really what I needed, and what I get by specifying the same integer
primary key as part of the VALUES block in the INSERT OR REPLACE. So
I'm good for now, and hopefully can stay away from this particular bit
of optimization until it ripens into necessity.

Dario Landazuri

unread,
Jan 4, 2010, 10:59:28 AM1/4/10
to kul...@googlegroups.com
> thanks, ON DUPLICATE KEY UPDATE looks like exactly what this needs,
> but the MySQL emulation in SQLite, which I'm prototyping in,
> eventually to migrate to a back-end unknown -- perhaps Tokyo cabinet
> -- doesn't document that clause.

You know, if you're prototyping for an unknown backend, wouldn't you
want to keep away from procedures specific to any one backend, be it
mysql, postgres, oracle, etc...?

Maybe it's just me...

Cheers,
Dario

--
************************************************************
Dario Landazuri Triangle Fraternity Minn97Ok
da...@landazuri.net
http://www.landazuri.net
************************************************************
"How many teamsters does it take to screw in a light bulb?"
"FIFTEEN!! YOU GOT A PROBLEM WITH THAT?"

Frank Wiles

unread,
Jan 4, 2010, 11:05:18 AM1/4/10
to kul...@googlegroups.com

I'm in agreement with Dario on this. If you're not sure what backend
you're going to be using then just do this as multiple simple SQL
statements. At the point this strains SQLite, even for dev testing,
you'll have wanted to move onto big and better things for all of your
other queries as well and can adjust your SQL accordingly when you've
picked a backend.

And if you're considering Tokoyo Cabinet, then all of this is moot as
it is a key/value store. In general I think you're pre-optimizing
unnecessarily.

David Nicol

unread,
Jan 4, 2010, 1:19:33 PM1/4/10
to kul...@googlegroups.com
On Mon, Jan 4, 2010 at 10:05 AM, Frank Wiles <fr...@wiles.org> wrote:
>
> I'm in agreement with Dario on this.  If you're not sure what backend
> you're going to be using then just do this as multiple simple SQL
> statements.  At the point this strains SQLite, even for dev testing,
> you'll have wanted to move onto big and better things for all of your
> other queries as well and can adjust your SQL accordingly when you've
> picked a backend.
>
> And if you're considering Tokoyo Cabinet, then all of this is moot as
> it is a key/value store.  In general I think you're pre-optimizing
> unnecessarily.

Undoubtedly. The unnecessary bit was using numbers for property names
at all: last week the name column in the properties table was text,
now its integer. Might move the tokens table language-side instead of
making the database do it all, but right now the database is doing it
all, and there are scads of problems not related to getting the
persistence to work right.

Thanks for your advice.

Reply all
Reply to author
Forward
0 new messages