Row iterator fails to sync with page iterator in Cassandra Pagination

90 views
Skip to first unread message

hemabs

unread,
Sep 26, 2014, 1:20:29 PM9/26/14
to java-dri...@lists.datastax.com

I have a table "person" with 500 records.

int pageSize = 5;
Select selectStatement=QueryBuilder.select().from("person");
selectSt.setFetchSize(5);
ResultSet results = session.execute(selectStatement);
List<ResultSet> resultsList = new ArrayList<ResultSet>();
int pageCounter = 0;
// I need records that are in page 2 and page 3 only

while(!results.isExhausted())
{   
            if (results.getAvailableWithoutFetching() <= pageSize ) 
             {
                   pageCounter ++;

                   if(pageCounter == 2 || pageCounter == 3) {
                        resultsList.add(results);
                   }

              } 

              if(pageCounter>=3) break;

              if(!results.isFullyFetched())
                  results.fetchMoreResults()
}


for (ResultSet r:resultsList) {
            Iterator<Row> rowsIterator = r.iterator();
            while (rowsIterator.hasNext()) {
                Row row = rowsIterator.next();
                System.out.println("row:"+row);
            }
}

The issue here is when I iterate through row iterator from the resultsList; the row iterator actually starts to iterate from the first record of first page. To get the records which are in page 2 and page 3, I unnecessarily have to iterate through every row this way. Also the first element(first page) of resultsList exhuast all records giving no opportunity for the second element(second page) to iterate. When i get the results from page 2 and page 3 and add them to the resultsList, it is supposed to store only resultset from page 2 and page 3 and not from the first page to all the way to 100 pages.

Can anyone let me know what is that I am doing wrong? I am stuck with this for couple of weeks now.

Thanks. Appreciate the help

Daniel Anderson

unread,
Sep 26, 2014, 3:36:22 PM9/26/14
to java-dri...@lists.datastax.com
Instead of resultsList.add(results), you would need to extract the rows from results and add it to a list.. to reach your goal of getting page 2 and 3.

Read the documentation on ResultSet. The object has internal state.

The bigger problem is that SELECT * FROM person; may not produce a deterministic ordering of results. I believe C* will order the table scan by partition key token, so the order would change if new person partitions were added.

Pagination is not something cassandra can do well on it's own.

One option would be to pull in all of the items from the person table, sort them in memory and then find the page you want. If performance and memory usage is not a concern, this may be something to consider. You could speed it up or limit memory usage with some clever code. Personally, I would avoid something like this because it will not scale for large tables and heavy traffic.

The best option would be to use SOLR from DataStax Enterprise to index the person table. Then, you can do real pagination through the SOLR apis.

Good luck!

Dan

hemabs

unread,
Sep 26, 2014, 4:13:17 PM9/26/14
to java-dri...@lists.datastax.com
Hi Daniel,

Thanks for the reply. If I have to fetch results from the above table for pages 2 and 3 why is it that I have to iterate every row from page 1?
Is this a drawback in Cassandra?

Thanks. Appreciate the help.

hemabs

unread,
Sep 26, 2014, 5:03:47 PM9/26/14
to java-dri...@lists.datastax.com
Here is the updated code:
int pageSize = 5;
Select selectStatement=QueryBuilder.select().from("person");
selectSt.setFetchSize(5);
ResultSet results = session.execute(selectStatement);
int pageCounter = 0;
// I need records that are in page 2 and page 3 only

while(!results.isExhausted())
{   
            if (results.getAvailableWithoutFetching() <= pageSize ) 
             {
                   pageCounter ++;

                   if(pageCounter == 2 || pageCounter == 3) {//this is the code with issue where the records are printed from first row of page 1
                        Iterator<Row> rowsIterator = results.iterator();
                        while (rowsIterator.hasNext()) {
                            Row row = rowsIterator.next();
                            System.out.println("row:"+row);
                    }

              } 

              if(pageCounter>=3) break;

              if(!results.isFullyFetched())
                  results.fetchMoreResults()
}
Why is the rowIterator always iterating from first row of first page? When i want it to iterate on page and 3.

Daniel Anderson

unread,
Sep 30, 2014, 11:08:57 AM9/30/14
to java-dri...@lists.datastax.com
ResultSet effectively works like a queue. iterator().next() and one() pops off the top of the queue. fetchMoreResults() adds to the end of the queue. If you want to get to page three, you need to pop off enough elements to get to page three.

Dan

hemabs

unread,
Sep 30, 2014, 3:20:00 PM9/30/14
to java-dri...@lists.datastax.com
Thank you very much Daniel. This is very helpful.
Reply all
Reply to author
Forward
0 new messages