ORDER BY COL LIMIT X now supported

1 view
Skip to first unread message

Jak Sprats

unread,
Oct 14, 2010, 5:23:46 PM10/14/10
to redisql-dev

and the following SQL features will be released in the next 2 weeks
1.) IN () - plus a brand new feature called the non-relational join
2.) auto_increment
3.) MIN(X), MAX(X), COUNT(X), DISTINCT(X) - no "group by"

and the node.js client is right around the corner

Jak Sprats

unread,
Oct 16, 2010, 1:11:03 AM10/16/10
to redisql-dev

ORDER BY col LIMIT X now supported for ALL use cases

including
select, iselect, istore, join, jstore, iupdate, idelete

The same use cases will be supported for "IN ()"

Jak Sprats

unread,
Oct 20, 2010, 4:34:33 AM10/20/10
to redisql-dev

SELECT ... WHERE col IN (,,,,) is now supported

including the non relation join
http://code.google.com/p/redisql/wiki/CommandReference#Index_Operations

NailK

unread,
Oct 26, 2010, 2:01:34 PM10/26/10
to redisql-dev
How fast will be COUNT(x) compared to ZCOUNT?

Jak Sprats

unread,
Oct 26, 2010, 7:04:52 PM10/26/10
to redisql-dev
Hi Nialk,

COUNT(X) should be called COUNT(*), my mistake. [COUNT(X) only makes
sense w/ a GROUP BY]

COUNT(*) will walk a B+Tree for simple range queries, for joins, it
must join and then count.

ZSets are zipmaps and the ZCOUNT is done by walking a linked list and
counting the number of entries.

I would imagine COUNT(*) to be about the same speed as ZCOUNT and
possibly slightly (5%) slower as walking a B+Tree is generally slower
than walking a Linked List.

If the ZSET is very large, then COUNT(*) may greatly outperform ZCOUNT
just because Redisql tables are much more memory efficient than redis-
ZSETs and getting the data to the CPU might take up more resources
than the algorithms themselves (this phenomenon is commonly referred
to as the memory wall).

If you give me more details on your use-cases, I may be able to give
better guesses.

- Jak

NailK

unread,
Oct 27, 2010, 2:54:20 PM10/27/10
to redisql-dev
Hi Jak,

Thanks for the answer. COUNT(x) looks very good, when it's ready I'll
try redisql for my rankings - bunch of zsets each about 100,000 size.
Redisql is really nice! I especially like compressed tables and
ability to archive to mysql.

Jak Sprats

unread,
Oct 27, 2010, 10:44:52 PM10/27/10
to redisql-dev
Hi Nialk,

turn on email updates for this post, I will implement COUNT(*) w/in
2-3 weeks.

One question, are your rankings FLOATs or INTs?

2nd question, can you give me 1-2 examples of how you would use
COUNT(*), I assume its a range query on a primary key, is that
correct?

- Jak

Jak Sprats

unread,
Oct 28, 2010, 4:08:00 AM10/28/10
to redisql-dev
HI Nailk,

i just checked in "SELECT COUNT(*)" to github ... (I just sat down and
did it quickly, I thought it would take more work)

I did some not-so-scientiic benchmarking of "SELECT COUNT(*)" vs
ZCOUNT and what I wrote above seemed to be true, but the tests were on
a system running a full desktop's worth of apps, etc... not-so-
scientific.

try it out, tell me if it is quicker, I think if you are doing 100K
range-queries "SELECT COUNT(*)" will be quicker.

Syntax for command line looks like this: ./redisql-cli SELECT COUNT(*)
from test where id between 100000001 AND 100120000

- Jak

NailK

unread,
Oct 29, 2010, 2:59:36 AM10/29/10
to redisql-dev
Hi Jak,

Thanks, I'll look at it in next few days and then will post a reply.

NailK

unread,
Nov 5, 2010, 3:02:52 PM11/5/10
to redisql-dev
Hi Jak,

I've used following table in order to emulate zsets: (obj_id int
primary key, score int) with an index by score.
Table size was 100K rows with random scores varied 100-20000.

Comparing inserts and updates speed I've found that Redisql is not
slower than Redis (in fact it was slightly faster). There were about
9-11K updates/sec.

Also it is good that Redisql occupied almost twice less of memory (to
compare memory usage I used 1M of rows instead of 100K). Insert
performance didn't decline on larger data set.

Then I've compared zcount, zrank and "select count(*) from table where
score between 0 and <random score>" (range query by index key but the
key is not primary).

Here are the results:
- zrank <random obj_id>: 9-10K selects/sec
- zcount 0..<random score>: 129 selects/sec
- select count(*) where score between 0 and <random score>: 20 selects/
sec

As we see, zrank operation is highly optimized while zcount and
count(*) are much slower.
And comparing zcount and count(*) it seems there might be things to
improve.

So I think Redisql is better if there is no need in massive range scan
queries - good speed on inserts/updates (and probably selects by
primary key also fast) and memory usage is especially good.

All the test were done with Predisql in single thread.

Regards,
Nail

Jak Sprats

unread,
Nov 5, 2010, 4:16:01 PM11/5/10
to redisql-dev
Hi Nail,

nice tests .. and glad to hear all the posititive points (speed,
memory usage, predictable performance) and the negative ones (count(*)
is slow) also.

This is a good example of Redisql's memory optimisations. In this
test, both Redisql and redis have to store pairs of integers, so the
chances for compression are not possible, and both have 2 indexes, the
only way to save memory is by being very exact w/ memory usage in data-
structures ... so I am happy to hear the results of this test, I had
not tested this exact scenario.

ZRANK is equivalent to "SELECT score where obj_id = <random_obj_id>" -
which should outperform ZRANK.

My implementations of "SELECT COUNT(*)" was the quickest one I could
possibly do, so I did not expect optimal performance, but 6 times
slower is bad.
"SELECT COUNT(*)"'s performance can be massively improved, it is just
a question of me spending ?1-2 days? coding it.
The current implementation iterates the B+Tree blindly, whereas
"SELECT COUNT(*)" can be done ignoring all the leaves except the ones
containing "min,max"
This requires a special B+tree iterator ... which is complicated and
really should not be rushed.
Currently each B+tree node holds 256 values, so using this iterator
and counting a range query of 20K would involve iterating through less
than 80 nodes where it now goes through all 20K :)

Is the lack of performance of "SELECT COUNT(*)" a deciding factor for
you?

- Jak

Jak Sprats

unread,
Nov 5, 2010, 5:08:54 PM11/5/10
to redisql-dev
Hi Nail,

submitted as issue 19
http://code.google.com/p/redisql/issues/detail?id=19

can you give me some feedback on how important this is to you, as that
will effect scheduling

- Jak

NailK

unread,
Nov 5, 2010, 6:15:57 PM11/5/10
to redisql-dev
Jak,

Yes, it is a deciding factor, though it can wait a couple of weeks.
I'm doing some code migration from mysql to redis and I would start to
migrate to redisql instead, hoping it will work fast.
When you could implement this?

Regards,
Nail

On Nov 6, 3:08 am, Jak Sprats <jakspr...@gmail.com> wrote:
> Hi Nail,
>
> submitted as issue 19http://code.google.com/p/redisql/issues/detail?id=19

NailK

unread,
Nov 5, 2010, 7:00:38 PM11/5/10
to redisql-dev
Jak,

Considering LIMIT clause, are there plans to add offset value (for
skipping N first rows)? Probably this case would be tied to fast
COUNT(*) as it's related to iterating over (possibly) large rowset.
I've just realised there should be an analogue of ZRANGEBYSCORE:
select * from table order by score desc limit 50000,20

Regards,
Nail

On Nov 6, 3:08 am, Jak Sprats <jakspr...@gmail.com> wrote:
> Hi Nail,
>
> submitted as issue 19http://code.google.com/p/redisql/issues/detail?id=19

Jak Sprats

unread,
Nov 5, 2010, 8:12:11 PM11/5/10
to redisql-dev
Hi Nail,

I can implement the sped up COUNT(*) w/in 7 days (maybe earlier).

As for supporting LIMIT OFFSET, yeah this is very simple, can be done
anytime.

Can you write out ALL SQL use-cases you need, then I can try and get
them done in one sweep.

I have many things on my TODO list of equal priority, so if you need
one, it bumps up its priority :)

- Jak

NailK

unread,
Nov 8, 2010, 9:22:09 AM11/8/10
to redisql-dev
Hi Jak,

Thanks. I think LIMIT OFFSET would be enough minimum for me.

Regards,
Nail

Jak Sprats

unread,
Nov 8, 2010, 8:29:48 PM11/8/10
to redisql-dev
Hi Nail,

I will check in LIMIT OFFSET tomorrow, it was maybe a 20 minute coding
effort, but it is coming in the middle of a major rewrite of the line
protocol, and it made sense to do it w/ the checking.

One question, you are doing "SELECT COUNT(*) FROM tbl WHERE
secondary_indexed_column BETWEEN 10 and 20000" ... a range query on a
secondary index, so not on the primary key, correct?
The methods to count from the primary key and the secondary keys is
different.

If you are only doing "SELECT COUNT(*) FROM tbl WHERE
secondary_indexed_column = X",(like in your issue 20 script) I can
give you very fast performance, the data is already counted.

- Jak

NailK

unread,
Nov 9, 2010, 6:27:33 AM11/9/10
to redisql-dev
Hi Jak,

Yes, there is a range query on a secondary index.

Regards,
Nail

Jak Sprats

unread,
Nov 9, 2010, 11:59:46 AM11/9/10
to redisql-dev
Hi Nail,

I just coded and tested this (will check it in later today, latest
tommorow) and it has O(1) lookup, and it took only 4 lines of code ...
so I am glad we communicated on this ... doing a "SELECT COUNT(*) on a
secondary index" is something B+Trees were designed to do ... the same
on a primary index requires a specialised and pretty sophisticated
iterator (so I am gonna wait on doing that until someone needs it).

Check in later today, latest tommorow (w/ LIMIT OFFSET too), you will
have to reinstall the php library, because I redid the entire wire
protocol, including all clients :)

- Jak

p.s. just as a point of reference, here is the gist I used (https://
gist.github.com/669378) - it will only work w/ the updated code

Jak Sprats

unread,
Nov 10, 2010, 5:25:05 PM11/10/10
to redisql-dev
Hi Nail,

LIMIT OFFSET is supported in release 0.1.1

SELECT COUNT (*) on Range Queries on foreign keys are now N*O(1) where
N is the number of distinct foreign keys.

SELECT COUNT(*) where score = 1 -> 1*O(1)
SELECT COUNT(*) where score BETWEEN 1 AND 3 -> 3*O(1)

It will be REAL FAST.
This gist ((https://gist.github.com/669378 - which is very primitive)
returns 0(1) times.

You need to update your Predisql library @https://github.com/JakSprats/
predis

- Jak

p.s. took 5 days, only because it came in the middle of a huge
rewrite :)
> ...
>
> read more »

NailK

unread,
Nov 11, 2010, 9:09:08 AM11/11/10
to redisql-dev
Hi Jak,

Sounds impressive. I will try it out.

Regards,
Nail

On Nov 11, 4:25 am, Jak Sprats <jakspr...@gmail.com> wrote:
> Hi Nail,
>
> LIMIT OFFSET is supported in release 0.1.1
>
> SELECT COUNT (*) on Range Queries on foreign keys are now N*O(1) where
> N is the number of distinct foreign keys.
>
> SELECT COUNT(*) where score = 1 -> 1*O(1)
> SELECT COUNT(*) where score BETWEEN 1 AND 3 -> 3*O(1)
>
> It will be REAL FAST.
> This gist ((https://gist.github.com/669378- which is very primitive)
> ...
>
> read more »

Jak Sprats

unread,
Dec 3, 2010, 1:31:33 AM12/3/10
to redisql-dev
Hi Nailk and All

I changed the SELECT ORDER BY col LIMIT OFFSET syntax to the
following:
"SELECT cols FROM tbls WHERE clause LIMIT n OFFSET m"

The "OFFSET" must be explicit ... this was to simplify the case where
SELECT ... STORE is used in conjuction w/ LIMIT OFFSET ...

- Jak

p.s. NailK, I fixed a bug where OFFSET was dropping the last row

On Nov 11, 7:09 am, NailK <nail.kasha...@gmail.com> wrote:
> Hi Jak,
>
> Sounds impressive. I will try it out.
>
> Regards,
> Nail
>
> On Nov 11, 4:25 am, Jak Sprats <jakspr...@gmail.com> wrote:
>
> > Hi Nail,
>
> > LIMIT OFFSET is supported in release 0.1.1
>
> > SELECT COUNT (*) on Range Queries on foreign keys are now N*O(1) where
> > N is the number of distinct foreign keys.
>
> > SELECT COUNT(*) where score = 1 -> 1*O(1)
> > SELECT COUNT(*) where score BETWEEN 1 AND 3 -> 3*O(1)
>
> > It will be REAL FAST.
> > This gist ((https://gist.github.com/669378-which is very primitive)
> ...
>
> read more »

Jak Sprats

unread,
Dec 3, 2010, 1:34:51 AM12/3/10
to redisql-dev

command reference updated @ http://code.google.com/p/alchemydatabase/wiki/CommandReference

and FYI, the project home page is being totally transitioned to
http://code.google.com/p/alchemydatabase/

the mailing list and issues are gonna take some time ... and all the
github repos ... renaming is a pain-in-the-ass

On Dec 2, 11:31 pm, Jak Sprats <jakspr...@gmail.com> wrote:
> Hi Nailk and All
>
> I changed the SELECT ORDER BY col LIMIT OFFSET syntax to the
> following:
> "SELECT cols FROM tbls WHERE clause LIMIT n OFFSET m"
>
> The "OFFSET" must be explicit ... this was to simplify the case where
> SELECT ... STORE is used in conjuction w/ LIMIT OFFSET ...
>
> - Jak
>
> p.s. NailK, I fixed a bug where OFFSET was dropping the last row
>
> On Nov 11, 7:09 am, NailK <nail.kasha...@gmail.com> wrote:
>
> > Hi Jak,
>
> > Sounds impressive. I will try it out.
>
> > Regards,
> > Nail
>
> > On Nov 11, 4:25 am, Jak Sprats <jakspr...@gmail.com> wrote:
>
> > > Hi Nail,
>
> > > LIMIT OFFSET is supported in release 0.1.1
>
> > > SELECT COUNT (*) on Range Queries on foreign keys are now N*O(1) where
> > > N is the number of distinct foreign keys.
>
> > > SELECT COUNT(*) where score = 1 -> 1*O(1)
> > > SELECT COUNT(*) where score BETWEEN 1 AND 3 -> 3*O(1)
>
> > > It will be REAL FAST.
> > > This gist ((https://gist.github.com/669378-whichis very primitive)
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages