cursors in Alchemy

Jak Sprats

unread,

Dec 14, 2010, 3:51:52 PM12/14/10

to redisql-dev

Hi All,

I have implemented cursors, and they are a huge step forward.

the idea w/ cursors is to let long running queries and short running
queries execute in parallel w/o serious blocking. since Alchemy is
single threaded, large selects/deletes/updates BLOCK the entire
database during their execution. Luckily any select/update/delete
dealing w/ less than 1000rows happens so quickly it doesnt really
matter, but the case for 1000+ rows must also be addressed, so the
solution is to break them up.

The simplest way to break up (we will use updates) a large update is
to use LIMIT OFFSET.
So the LARGE UPDATE
"UPDATE table SET count=4 WHERE fk =1" -> which may update 100K rows
would be broken up into
1.) "UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000
OFFSET 0"
2.) "UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000
OFFSET 1000"
3.) "UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000
OFFSET 2000"
.....
99.) "UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000
OFFSET 99000"

I tested this approach, in the following gist: https://gist.github.com/739756,
and the fantastic news is this approach only results in an 18%
performance hit (for parallel running SELECTs). So long running
queries and short running queries are very doable in AlchemyDB, but
the long running queries SHOULD be batched.

But I wanted a simpler approach, the logic to do these 100 batch
updates is always the same (get number of rows, break up into n/nrows
batch updates), so I automated it via a new syntax:
"UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000 OFFSET
variable"
The "variable" at the end will store the current offset in a redis
string, and the same statement can be looped on.

So calling
"UPDATE table SET count=4 WHERE fk =1 ORDER BY fk LIMIT 1000 OFFSET
variable"
and then sleeping 10ms inbetween calls and stopping the batch when the
number_of_rows_updated is less than 1000 is a dead simple way to do
this.

Here is a gist showing how to do this in Lua: (i forgot to code in the
sleep) https://gist.github.com/741051

In psuedocode a LARGE UPDATE should look like this
limit=1000;
updated_rows=1000;
while (updated_rows == limit) {
updated_rows = client("UPDATE table SET count=4 WHERE fk =1 ORDER BY
fk LIMIT " + limit + " OFFSET variable");
}

This large_update(), or large_select, or large_delete, can easily be
abstracted out into the client, where the developer can elect to use
large_updates when he knows an update statement will be expensive.

This is a HUGE step forward, the problem of parallel long and short
running queries is a universal database problem, this approach is a
good one

- Jak

p.s. if there is any interest/need, this same "cursor" approach can be
used to build indexes in batches (which can be very time consuming as
any DBA can attest to)

Jak Sprats

unread,

Dec 14, 2010, 4:36:52 PM12/14/10

to redisql-dev

Hi All,

needless to say, doing batch updates (to open up other parallel
queries) throws consistency out the window ... a batch update can take
5 minutes, and all the while SELECTs will be running against the
table, some hitting updated rows, some not ... this is the desired
functionality and the reason AlchemyDB is so fast is it does not
ensure consistency other than that ALL actions are themselves atomic,
once they are broken up into batches, they are no longer consistent.

Just a heads up .. most of the time, a large update or delete does not
need to be atomic

- Jak

> sleep)https://gist.github.com/741051

Jak Sprats

unread,

Dec 15, 2010, 7:03:34 PM12/15/10

to redisql-dev

in bash

function large_update() {
if [ -z "$4" ]; then
echo "Usage: large_update statement limit variable sleep_throttle"
return;
fi
STMT="$1"
LIMIT="$2"
VAR="$3"
THRTL="$4"

MOD_STMT="$STMT LIMIT $LIMIT OFFSET $VAR"
time (
while true; do
RES=$($CLI $MOD_STMT | cut -f 2 -d \ )
if [ "$RES" != "$LIMIT" ]; then echo "BREAK"; break; fi
sleep $THRTL
done
)
}

call from command line:
large_update "UPDATE ten_mill_modTEN_fk SET count = count + 1 WHERE
fk=1 ORDER BY fk" 5000 CURSOR_ten_mill_modTEN_fk_UPDATE 0.01

oh yeah and expressions in update statements are also supported "SET
count = count + 1" ... I will post on that 2moro

Reply all

Reply to author

Forward