How to safely iterate through very large dataset with pagination?

149 views
Skip to first unread message

Stuart Reynolds

unread,
Jun 9, 2016, 1:08:26 PM6/9/16
to OrientDB
(Sorry for the repost -- my original question was a mess. Have deleted it and am reposting).

I'd like to iterate through a very large set of records in Orientdb.


So that the result doesn't fill up my machine's memory, I've tried to implement paginated queries, but I seem to be getting back

  •  - duplicated documents
  •  - record sets shorter than the page size
  •  - a infinite series of results

The original Java method listed in the docs is as follows:


OSQLSynchQuery<ODocument> query = new OSQLSynchQuery<ODocument>("select from Customer LIMIT 20");
for (List<ODocument> resultset = database.query(query); !resultset.isEmpty(); resultset = database.query(query)) {
    ...
}


I've implemented this as scala:


val query = new OSQLSynchQuery[ODocument]("select from Thing LIMIT 5")
var resultset = db.query[OResultSet[ODocument]](query)
while (!resultset.isEmpty()) {
  // process result set here
  resultset = db.query(query)
}


Here's the full example


def makeThing(x:Int) ={
  val doc = new ODocument("Thing")
  doc.field("x",x)
  doc
}

val db: ODatabaseDocumentTx = new ODatabaseDocumentTx("memory:jsondb")
db.create()
db.set(MINIMUMCLUSTERS, 3)
db.set(CLUSTERSELECTION, "round-robin")
db.set(CONFLICTSTRATEGY, "content")
db.set(CHARSET, "UTF-8")


println("SAVING--------")

for (x <- 0 until 12) {
  val doc:ODocument = makeThing(x)
  val saved = db.save[ODocument](doc)
  println(saved)
}


println("\n\nQUERYING--------")

val query = new OSQLSynchQuery[ODocument]("select from Thing LIMIT 5")
var resultset = db.query[OResultSet[ODocument]](query)
while (!resultset.isEmpty()) {
  resultset.toArray.foreach(println)
  resultset = db.query(query)
  println("---------")
}


But here's the output:


SAVING--------
Thing#9:0{x:0} v1
Thing#10:0{x:1} v1
Thing#11:0{x:2} v1
Thing#9:1{x:3} v1
Thing#10:1{x:4} v1
Thing#11:1{x:5} v1
Thing#9:2{x:6} v1
Thing#10:2{x:7} v1
Thing#11:2{x:8} v1
Thing#9:3{x:9} v1
Thing#10:3{x:10} v1
Thing#11:3{x:11} v1



QUERYING--------
Thing#9:0{x:0} v1
Thing#9:1{x:3} v1
Thing#9:2{x:6} v1
Thing#9:3{x:9} v1
Thing#10:0{x:1} v1  # So far, so good...
---------
Thing#9:0{x:0} v1   # Already seen this
Thing#10:1{x:4} v1
Thing#10:2{x:7} v1
Thing#10:3{x:10} v1
Thing#11:0{x:2} v1
---------
Thing#9:0{x:0} v1    # Already seen this
Thing#11:1{x:5} v1
Thing#11:2{x:8} v1
Thing#11:3{x:11} v1  # Page cut short
---------
Thing#9:0{x:0} v1   # Already seen this!
---------
Thing#9:1{x:3} v1
Thing#9:2{x:6} v1
Thing#9:3{x:9} v1
Thing#10:0{x:1} v1
Thing#10:1{x:4} v1



Note that the DB is in memory, and no-one is simultaneously writing to the DB.


Using ODB client 2.1.1


What's the sane and safe way to iterate through a very large dataset. As far as I can see, the method in the docs is buggy.


Stuart Reynolds

unread,
Jun 14, 2016, 11:37:50 AM6/14/16
to OrientDB
Bump!

I have confirmed that pagination still has this odd (buggy?) behavior in 2.1.19 and filed a bug.

Stuart Reynolds

unread,
Jun 15, 2016, 2:29:30 PM6/15/16
to OrientDB
Does anyone know the recommended way for an ODB clients to receive a large query result (>10000 records)?

OSQLAsynchQuery seemed like an alternative to pagination (which produces odd behavior), but OSQLAsynchQuery  gives me this distressing warning if I ask for >10000 results? 

INFO: {db=jsondb} [TIP] Query 'SELECT FROM Thing' returned a result set with more than 10000 records. Check if you really need all these records, or reduce the resultset by using a LIMIT to improve both performance and used RAM

My example is on SO here.

Should I ignore the warning, or should I be doing something else?

pabloa

unread,
Jun 15, 2016, 7:01:31 PM6/15/16
to OrientDB
We need cursors.

Something signed like 

Iterator[T] cursor = ODatabaseDocument.query("sql", 400); // 400 = local buffer size. In this case 400 T will be returned and cached locally.

or https://github.com/orientechnologies/orientdb/issues/2425

Pablo

Stuart Reynolds

unread,
Jun 15, 2016, 7:48:54 PM6/15/16
to orient-...@googlegroups.com
+1 

--

---
You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/NQ-UwR2Rf0M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Luigi Dell'Aquila

unread,
Jun 16, 2016, 3:50:40 AM6/16/16
to orient-...@googlegroups.com
Hi guys, 

Query cursors are one of the hot topics for 3.0 development.
In the meantime I suggest you to use RID based paging or SKIP/LIMIT

I'll check the original issue and see if I can fix it quickly

Thanks

Luigi

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.

pabloa

unread,
Jun 16, 2016, 2:42:09 PM6/16/16
to OrientDB
I ll do that.

do you have an ETA for for version 3.0? 

P

Luigi Dell'Aquila

unread,
Jun 17, 2016, 4:11:56 AM6/17/16
to orient-...@googlegroups.com
The ETA for 3.0 is approximately the end of this year

Thanks

Luigi

alessand...@gmail.com

unread,
Jun 17, 2016, 4:12:45 AM6/17/16
to OrientDB
Hi,
you could see the roadmap.

Kind regards,
Alessandro
Reply all
Reply to author
Forward
0 new messages