Random functionality

Skip to first unread message

Josh Harrison

Aug 28, 2015, 12:00:44 PM8/28/15
to OrientDB
I've seen a number of posts over the months/years asking about the implementation of a random order sorter, so that I can randomize the sorting of the results I get back. (SELECT FROM document ORDER BY RANDOM or similar syntax) Many of them refer to this appearing in 2.1.
I'm on 2.1 - is this feature present? What is the syntax?
Is there support for easily adding "ORDER BY xyz" function support? Any examples of this, if so?

Josh Harrison

Sep 4, 2015, 4:16:50 PM9/4/15
to OrientDB
This would really be a major win for us, right now we're having to do some artificial windowing to get random edges and the performance is sorta terrible. Devs, anything you can point me to here?


Sep 7, 2015, 5:26:02 AM9/7/15
to OrientDB
OK - first, take this information with a grain of salt because I'm new to OrientDb and haven't actually rolled out a successful release but.....

Thought #1:  Get 'Groovy' with it....

I've been reading all the ways you can write server functions (store procedures) and one is Groovy which seems to relate to or also be called 'Gremlin' or 'TinkerPop' or 'Blueprints'.

https://github.com/tinkerpop/blueprints/wiki/OrientDB-Implementation - "Blueprints is the default Java API for OrientDB, so you don’t need to include additional modules. For more information look at OrientDB Blueprints API."

I ran across this 'Coin Step' yesterday when scanning the TinkerPop3 documentation.

From http://tinkerpop.incubator.apache.org/docs/3.0.0-incubating/#coin-step

Coin Step

To randomly filter out a traverser, use the coin()-step (filter). The provided double argument biases the "coin toss."

e.g.  gremlin> g.V().coin(0.5)

Order Step

When the objects of the traversal stream need to be sorted, order()-step (map) can be leveraged.

I noticed 'shuffle' -- "Randomizing the order of the traversers at a particular point in the traversal is possible with Order.shuffle."

e.g. gremlin> g.V().hasLabel('person').order().by(shuffle)

Groovy statements do work as server functions so maybe this is something that you could use?


Thought #2: 'JavaScript'

You can always write a store proc that generates a random number 0 to size() and steps ahead that number of steps.

For anyone that knows, please correct me.

Josh Harrison

Sep 7, 2015, 4:15:11 PM9/7/15
to OrientDB
Definitely interesting possibilities thank you!
I think the coin() doesn't decrease our traversal time, just gets us a more random sample - if we're looking to get 10k results, with a 50% random we'd end up traversing ~20k edges. Within those edges it'd be random what we get, but the runtime shouldn't be much better

I'm not sure about shuffle - that may do it, but I believe that the end result of shuffling your results is that the sample you've taken gets shuffled - not that shuffle the set of documents you go through to get the result. By that I mean if you had a set [0,1,2,3,4,5,6,7,8,9], if you selected 4 records you'd get [0,1,2,3]. I think with shuffle you'd get something like [2,3,0,1], not [6,2,8,3].

I really like the last possibility - we get the size of the edge counts, and select either a set percentage or a set number of random ints from the range of [0-size()], then use that result to get what we need. I'm not 100% sure this is easy to implement through the java API but we'll see!

Still, a SQL query with a randomized selection would be a great thing to have.

Ultimately what we need to do is a weighted random - all our edges have weights, and I need to traverse the edges in a weighted random fashion. If we're able to implement this in a server side function, we'd be in a good spot for our query run time.
Reply all
Reply to author
0 new messages