Hi All,
I have implemented cursors, and they are a huge step forward.
the idea w/ cursors is to let long running queries and short running
queries execute in parallel w/o serious blocking. since Alchemy is
single threaded, large selects/deletes/updates BLOCK the entire
database during their execution. Luckily any select/update/delete
dealing w/ less than 1000rows happens so quickly it doesnt really
matter, but the case for 1000+ rows must also be addressed, so the
solution is to break them up.
The simplest way to break up (we will use updates) a large update is
to use LIMIT OFFSET.
So the LARGE UPDATE
"UPDATE table SET count=4 WHERE fk =1" -> which may update 100K rows
would be broken up into
1.) "UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000
OFFSET 0"
2.) "UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000
OFFSET 1000"
3.) "UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000
OFFSET 2000"
.....
99.) "UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000
OFFSET 99000"
I tested this approach, in the following gist:
https://gist.github.com/739756,
and the fantastic news is this approach only results in an 18%
performance hit (for parallel running SELECTs). So long running
queries and short running queries are very doable in AlchemyDB, but
the long running queries SHOULD be batched.
But I wanted a simpler approach, the logic to do these 100 batch
updates is always the same (get number of rows, break up into n/nrows
batch updates), so I automated it via a new syntax:
"UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000 OFFSET
variable"
The "variable" at the end will store the current offset in a redis
string, and the same statement can be looped on.
So calling
"UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000 OFFSET
variable"
and then sleeping 10ms inbetween calls and stopping the batch when the
number_of_rows_updated is less than 1000 is a dead simple way to do
this.
Here is a gist showing how to do this in Lua: (i forgot to code in the
sleep)
https://gist.github.com/741051
In psuedocode a LARGE UPDATE should look like this
limit=1000;
updated_rows=1000;
while (updated_rows == limit) {
updated_rows = client("UPDATE table SET count=4 WHERE fk =1 ORDER BY
fk LIMIT " + limit + " OFFSET variable");
}
This large_update(), or large_select, or large_delete, can easily be
abstracted out into the client, where the developer can elect to use
large_updates when he knows an update statement will be expensive.
This is a HUGE step forward, the problem of parallel long and short
running queries is a universal database problem, this approach is a
good one
- Jak
p.s. if there is any interest/need, this same "cursor" approach can be
used to build indexes in batches (which can be very time consuming as
any DBA can attest to)