default fetchSize

Adil C.

unread,

Dec 12, 2014, 11:37:33 AM12/12/14

to java-dri...@lists.datastax.com

Hi,

I can't found what is the default fetchSize (if not set) for a statement?

Thanks

Andrew Tolbert

unread,

Dec 12, 2014, 11:41:43 AM12/12/14

to java-dri...@lists.datastax.com

Hi Adil,

The default fetch size is 5000. It can be configured using QueryOptions (http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/QueryOptions.html#setFetchSize(int) ) which can be configured on the Cluster Builder with setQueryOptions (http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Cluster.Builder.html#withQueryOptions(com.datastax.driver.core.QueryOptions))

Thanks,

Andy

Adil C.

unread,

Dec 12, 2014, 12:00:11 PM12/12/14

to java-dri...@lists.datastax.com

ok thanks a lot.

Adil

ja

unread,

May 20, 2015, 8:55:42 AM5/20/15

to java-dri...@lists.datastax.com

Hi Andrew,

We are using Cassandra 1.2.18.1 with the Datastax Driver 2.1.4. When i tried to run a query which has more than 5000 matching records without specifying any fetchsize, i got all the records when iterating through the resultset. Does it mean the default fetchsize is not used? . Our intention is to get all available records without pagination. Based on this observation, is it enough to specify just the query without setting any LIMIT or fetchsize on the statement? . Please clarify.

Code snippet that was tried out :

Statement st = new SimpleStatement("SELECT xyz from abc ...");

ResultSet rs = cassandraConnManager.getSession().execute(st);

for (com.datastax.driver.core.Row row : rs.all())

{

System.out.println(row.getUUID("xyz"));

}

Thanks,

Joseph

Olivier Michallat

unread,

May 20, 2015, 9:29:36 AM5/20/15

to java-dri...@lists.datastax.com

When you iterate over the result set, the driver automatically sends background queries to retrieve additional results. The fetch size applies to each query. This is described here: http://datastax.github.io/java-driver/2.0.10.1/features/paging/

If you want to stop after the first query (i.e. never send background queries), the following would work:

int remaining = rs.getAvailableWithoutFetching();

for (Row row : rs) {

// process your row...

if (--remaining == 0) {

break;

}

I think LIMIT would be better for clarity, but it's your choice.

--

Olivier Michallat

Driver & tools engineer, DataStax

To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Andrew Tolbert

unread,

May 20, 2015, 9:53:17 AM5/20/15

to java-dri...@lists.datastax.com

Hi Joseph,

We are using Cassandra 1.2.18.1 with the Datastax Driver 2.1.4.

In addition to what Olivier said, Cassandra 1.2 does not support paging as it was added in Protocolv2 which was first introduced in Cassandra 2.0. So I agree with Olivier's suggestion to use LIMIT.

Thanks!

Andy

Joseph Anish Alex

unread,

May 20, 2015, 1:06:21 PM5/20/15

to java-dri...@lists.datastax.com

Thanks.my requirement is to get all available rows ,without putting too much load on the server. For C* 1.x,does the driver ensure that background queries fetch the next set of records when the iterator has finished with current set?

Olivier Michallat

unread,

May 21, 2015, 12:15:23 PM5/21/15

to java-dri...@lists.datastax.com

No, since Cassandra 1.2 uses protocol v1, paging is not supported at the protocol level.

You'll have to handle this manually:

- if you page across partitions, the token() CQL function allows you to start after the last retrieved partition key

- if you page within a partition, use conditions on your clustering columns

- if you do both, you'll have to combine both approaches.

Example (3 rows per partition)

cqlsh:test> create table my_table(k int, v int, primary key(k,v)) with CLUSTERING ORDER BY (v asc);

cqlsh:test> insert into my_table(k, v) values (1, 1);

cqlsh:test> insert into my_table(k, v) values (1, 2);

cqlsh:test> insert into my_table(k, v) values (1, 3);

cqlsh:test> insert into my_table(k, v) values (2, 1);

cqlsh:test> insert into my_table(k, v) values (2, 2);

cqlsh:test> insert into my_table(k, v) values (2, 3);

Paging within a partition:

cqlsh:test> select * from my_table limit 2;

k | v

---+---

1 | 1

1 | 2

cqlsh:test> select * from my_table where k=1 and v>2 limit 2;

k | v

---+---

1 | 3

Paging across partitions:

cqlsh:test> select * from my_table where token(k)>token(1) limit 2;

k | v

---+---

2 | 1

2 | 2

--

Olivier Michallat

Driver & tools engineer, DataStax

Joseph Anish Alex

unread,

May 22, 2015, 8:07:50 AM5/22/15

to java-dri...@lists.datastax.com

Thanks. In my usecase, there are 3 partition keys (k1, k2,k3) and 2 clustering keys (c1,c2) . c1 is a timestamp and c2 is a GUID - i.e there could be many events for the same c1. The query uses only one partition at a time.

select * where k1=A and k2=B and K3=C and c1>=t1 and c1<=t2 limit 500

Following this approach, if i use c1>(latest TS from 1st query), some events that may have the same timestamp would be lost, while if we use c1>=(latest TS from 1st query), there could be some duplicates. I guess we need to filter these duplicates manually (comparing with previous resultset). Please confirm.

Further, if i just need to get c2 from this query (i.e resultset payload size per row is small) , is it safe to provide a high-enough limit (say limit 100,000) assuming we know the # of records would be not more than 50,000 per day, so that i dont have to deal with manual pagination.

Reply all

Reply to author

Forward