Support for paging queries?

1,371 views
Skip to first unread message

Keith Freeman

unread,
Aug 1, 2013, 7:37:25 PM8/1/13
to java-dri...@lists.datastax.com
Hello,

I don't see any support in the Java API for paging queries, am I missing it?  I.e. I want to do "select * from table;" in a loop getting N rows at a time.

Thanks.

Michael Figuiere

unread,
Aug 1, 2013, 7:54:40 PM8/1/13
to java-dri...@lists.datastax.com
Cassandra 1.2.x doesn't support paging queries. You have to do it manually with the help of the LIMIT keyword and the appropriate where clause. An example of such a query is:

1st query: SELECT * FROM user LIMIT 100;
Nth query: SELECT * FROM user WHERE token(id) > token(last_received_id) LIMIT 100;

Where 'user' is a table with a primary key column named 'id'.

Starting from Cassandra 2.0 (to be released later this summer, but already available in beta), we'll support paging (see https://issues.apache.org/jira/browse/CASSANDRA-4415), so you'll be able to iterate on the resultset and rows will be fetched on the fly by the driver. You'll be able to control the amount of rows to fetch. See https://github.com/datastax/java-driver/tree/v2-wip for a first look at what we prepare. Note that as the name of the branch implies this is still work in progress, so consider it as experimental for now.


Michael

Keith Freeman

unread,
Aug 2, 2013, 10:15:48 AM8/2/13
to java-dri...@lists.datastax.com
Thanks, I had seen support for paging queries in the Astyanax library (the "AllRowsQuery" feature), I guess they use the token function under the covers.  Had also read about the token function in the Cassandra 1.2 doc, but it's not very clear what it does.

This all leads me to another question: with other, more mature java client libraries available (Hector, Astyanax), why start another one?  How does the datastax java driver compare to those, and why would a developer choose it?

Aaron Daubman

unread,
Aug 2, 2013, 10:34:45 AM8/2/13
to java-dri...@lists.datastax.com
The DataStax java-driver is (was) the only client to (really) support CQL and the new binary protocol. That would be the main driver for using it. If you have been using Thrift and plan to continue to do so you wouldn't even be able to use the java-driver. If you are using CQL, you won't be able to use Hector, Astyanax or other Thrift-based client libraries (unless you are _really_ fond of execute_cql_query).

Perhaps there are efforts to bring better CQL support to these clients (I haven't looked in a few months), and maybe even binary protocol support, but if there are I am not aware of them.

Keith Freeman

unread,
Aug 2, 2013, 11:01:07 AM8/2/13
to java-dri...@lists.datastax.com
Ok, that makes sense.  But what's not clear to me is whether the datastax java driver is always using the binary protocol?  Like in the "first client" example, this code:
session.execute(
      "INSERT INTO simplex.songs (id, title, album, artist, tags) " +
      "VALUES (" +
          "756716f7-2e54-4715-9f00-91dcbea6cf50," +
          "'La Petite Tonkinoise'," +
          "'Bye Bye Blackbird'," +
          "'Joséphine Baker'," +
          "{'jazz', '2013'})" +
          ";");

...is using a string for all the data values -- does the driver always translate them to a binary protocol to communicate to the server?  Or do I need to use a different interface (e.g. QueryBuilder)? 

Alex Popescu

unread,
Aug 2, 2013, 11:49:22 AM8/2/13
to java-dri...@lists.datastax.com


On Friday, August 2, 2013 8:01:07 AM UTC-7, Keith Freeman wrote:
Ok, that makes sense.  But what's not clear to me is whether the datastax java driver is always using the binary protocol?  

Short answer: I've quickly scanned through the driver's code and to me it looks like in both cases the binary protocol is used.

Longer answer: The string based query is wrapped in a SimpleStatement (implementing Query), then the rest of the execution path goes through the same path which makes me think that the binary protocol is used.

Probably someone with better knowledge of the code could say an Yey or Ney to the above comments :-).

:- a)

Michael Lasmanis

unread,
Aug 2, 2013, 2:56:17 PM8/2/13
to java-dri...@lists.datastax.com
Michael,

How would you implement the 'token(last_received_id)' via the QueryBuilder?  I can't seem to find an tokenizer method for a value (I found the token(columnName) part).

Thanks
Michael

Michael Lasmanis

unread,
Aug 6, 2013, 8:52:20 PM8/6/13
to java-dri...@lists.datastax.com
Nudge....

Alex Popescu

unread,
Aug 9, 2013, 3:42:06 PM8/9/13
to java-dri...@lists.datastax.com
Michael,

Apologies for not replying... we've had a few crazy days, but we'll make sure we'll follow up.

Keith Freeman

unread,
Aug 16, 2013, 12:29:45 PM8/16/13
to java-dri...@lists.datastax.com
For the benefit of anyone visiting this thread in the future: the token() and LIMIT method is not sufficient when you have wide rows, it will miss data.

There's a thread recently started on the cassandra-user list, but the idea is that LIMIT will sometimes cut off results from a wide row, and the following SELECT will start from the next row so those lost results from the previous row will never be returned.

Correct paging is significantly more complicated, and is described here: http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive


On Thursday, August 1, 2013 5:54:40 PM UTC-6, Michael Figuiere wrote:
Message has been deleted

Ophir Radnitz

unread,
Sep 28, 2013, 3:47:14 AM9/28/13
to java-dri...@lists.datastax.com
Thank you for this post, really helpful.


On Fri, Sep 27, 2013 at 10:35 AM, Jan Algermissen <alger...@acm.org> wrote:


On Friday, August 16, 2013 6:29:45 PM UTC+2, Keith Freeman wrote:
For the benefit of anyone visiting this thread in the future: the token() and LIMIT method is not sufficient when you have wide rows, it will miss data.

There's a thread recently started on the cassandra-user list, but the idea is that LIMIT will sometimes cut off results from a wide row, and the following SELECT will start from the next row so those lost results from the previous row will never be returned.

Correct paging is significantly more complicated, and is described here: http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive

On Thursday, August 1, 2013 5:54:40 PM UTC-6, Michael Figuiere wrote:
Cassandra 1.2.x doesn't support paging queries. You have to do it manually with the help of the LIMIT keyword and the appropriate where clause. An example of such a query is:

1st query: SELECT * FROM user LIMIT 100;
Nth query: SELECT * FROM user WHERE token(id) > token(last_received_id) LIMIT 100;

Where 'user' is a table with a primary key column named 'id'.

Starting from Cassandra 2.0 (to be released later this summer, but already available in beta), we'll support paging (see https://issues.apache.org/jira/browse/CASSANDRA-4415), so you'll be able to iterate on the resultset and rows will be fetched on the fly by the driver. You'll be able to control the amount of rows to fetch. See https://github.com/datastax/java-driver/tree/v2-wip for a first look at what we prepare. Note that as the name of the branch implies this is still work in progress, so consider it as experimental for now.


[Just posted a reply to this that got stuck in the list for moderation - never mind, because afterwards, Google turned up the answer:]


Great job! Thanks.

Jan





 

Michael


On Thu, Aug 1, 2013 at 4:37 PM, Keith Freeman <8fo...@gmail.com> wrote:
Hello,

I don't see any support in the Java API for paging queries, am I missing it?  I.e. I want to do "select * from table;" in a loop getting N rows at a time.

Thanks.

To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Reply all
Reply to author
Forward
0 new messages