Aggregating weights, and random walk

101 views
Skip to first unread message

Josh Harrison

unread,
Aug 10, 2015, 4:31:34 PM8/10/15
to OrientDB
Hi all,
So, I've got the following rough structure:
A "document" vertex - the class is called "document"
A "metadata" vertex - the class is called "metadata"
A weighted edge connecting them - the class is called "metadata_of"

A single document may connect to many metadata vertexes, and many documents may connect to a single metadata vertex.

I have two primary questions

Starting from a random document, I'd like to write a function that will walk to another document some random number of steps away. The traversal should be weighted both on the centrality of the metadata vertices and the weight of the connections to those vertices. 
Is it possible to write stored scripts in either the Orient query language, or Gremlin, that will allow quick execution of this task, and parameterized input to tweak the degree to which the weights and centrality play on the random walk?
Any pointers on how I might start off with this? My primary language is Python, but I can make this work in whatever, if need be.

Second, if I wanted to take a collection of documents and aggregate all the weights of all their constituent metadata edges, how would I do this?
That is to say I have
(Doc 1)--(edge weight 0.562) -->(Metadata 1)
(Doc 1)--(edge weight 0.124) -->(Metadata 2)
(Doc 2)--(edge weight 0.553) -->(Metadata 1)
(Doc 2)--(edge weight 0.123) -->(Metadata 3)
(Doc 3)--(edge weight 0.234) -->(Metadata 1)
(Doc 3)--(edge weight 0.274) -->(Metadata 4)

I want to be able to select doc 1 and doc 3 and have a resulting set of weights like the following:
Metadata 1: 0.796
Metadata 4: 0.274
Metadata 2: 0.124
The exact nature of the method of combining the documents may change (not necessarily straight addition), but the end result would be the same, I want to be able to easily aggregate the weight of all the concept links from a definable set of documents.
Is there a good way to do this in the orient query language, or Gremlin?

I definitely have a preference for Gremlin for portability's sake, but if there's a fast/easy way to do it in the orient query language directly as well that'd be totally fine!

Thanks,
Josh Harrison

alessand...@gmail.com

unread,
Aug 11, 2015, 6:17:35 AM8/11/15
to OrientDB
Hi Josh,
for the second question you can use a server-side function like this

var g=orient.getGraph();

var query="select name, sum(weigth) from (select weigth,inV().name as name from (select expand(outE('metadata_of'))";
query+=" from (select from Document where name in " + names + " ))) group by name";

var list=g.command('sql',query);

return list;

the function has one parameter: names

In your case enter in names ['Doc 1','Doc 3']

Regards,
Alessandro

Josh Harrison

unread,
Aug 11, 2015, 12:01:28 PM8/11/15
to OrientDB
Super, thanks Alessandro!

alessand...@gmail.com

unread,
Aug 12, 2015, 4:44:23 AM8/12/15
to OrientDB
Hi Josh,
I did not understand your first question, can you explain what you want to achieve ? 
What you mean by "centrality of the metadata" ?
Your structure is like that in Figure 1 or such as that in Figure 2, or neither ?
Alessandro

Josh Harrison

unread,
Aug 12, 2015, 5:09:31 AM8/12/15
to OrientDB
Figure 1 is an accurate representation of the data.
The goal is to, starting from a random document, take a random number of steps to another document. The navigation of the path would be weighted by how many inbound edges the metadata object in question has, as well as the weight of the individual edge between the document and the metadata object. 

Essentially in your diagram, if I start at "document 4", and I take a "random" walk to a metadata object, I will of course go to "metadata 2" (the only link possible from document 4). Then, say the weight of the (document 3)->(metadata 2) edge is 0.0001, and the weight of the (document 2)->(metadata 2) edge is 0.721. In most cases I want to then navigate to hop to document 2, as it has a much higher edge weight - that's not to say that in 30,000 iterations I wouldn't occasionally hop to document 3, though! We then generate a random number to see if we continue walking. Say we do.

Now that we're at document 2, we start the process over again. BUT, document 2 has two metadata properties. Metadata 1 has two edges associated with it, and metadata 2 has three edges associated with it. I want to weight the decision to move to the random metadata node to be weighted by the number of edges connecting to that metadata node. In this case, I want to generally weight LESS connected nodes more highly - so in this situation we'd slightly prefer hopping to metadata 1 instead of metadata 2. From metadata 1, we end up on document 1. We then generate a random number to see if we continue walking. Say we don't.

Our weighted random walk might have moved from document 4 -> metadata 2 -> document 2 -> metadata 1 -> document 1

In the real graph the number of documents visited could be anywhere from 1->all of them, in theory

Hopefully this explained what I'm trying to do a little better! :)

If this is too complicated to do as a canned function in orient I think that's totally reasonable, someone else is working on writing a more direct Java interface which may give us somewhat better control over all this.
Thanks,
Josh

alessand...@gmail.com

unread,
Aug 12, 2015, 9:13:01 AM8/12/15
to OrientDB
Hi Josh,
I tried to do this little Java function,
let me know if the function is like that you expect.

private Vertex myFunction(OrientGraph g){
String name="Doc 4";
int counter=0;
int steps=2;
Vertex vP=null;
Iterable<Vertex> result=g.command(new OSQLSynchQuery<Vertex>("select from Document where name = '"+name+"'")).execute();
if(result.iterator().hasNext()){
Vertex p=result.iterator().next();
vP=p;
}

while(counter<steps){
Iterable<Vertex> listMetadata=g.command(new OSQLSynchQuery<Vertex>("select expand(out('metadata_of')) from Document where @rid = " + vP.getId().toString() + ")")).execute();
Vertex vMetadata=null;
int min=0;

for(Vertex v:listMetadata){
String stringNumber=v.getEdges(Direction.IN, "metadata_of").toString();
stringNumber=stringNumber.replace("[", "").replace("]", "");
int numero=Integer.parseInt(stringNumber);
if(vMetadata==null || numero<min ){
vMetadata=v;
min=numero;
}
}

Iterable<Edge> edges=vMetadata.getEdges(Direction.IN, "metadata_of");

float peso=0.0f;
Vertex v0=null;

for(Edge e:edges){
float pesoArco=e.getProperty("weight");
Vertex vOut=e.getVertex(Direction.OUT);
if((v0==null || pesoArco>peso) && vP.getId()!=vOut.getId()){
v0=vOut;
peso=pesoArco;
}
}

if(v0!=null){
vP=v0;
}
else{
break;
}
counter++;
}
return vP;
}

Alessandro

Josh Harrison

unread,
Aug 12, 2015, 12:06:43 PM8/12/15
to orient-...@googlegroups.com
Fantastic, that's a great framework for what we'll need to put together. I'll pass this on to my backend developer! thanks again Alessandro

--

---
You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/e3kEw8Wa5fo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages