How to perform a graph query where the result depends on another Class (data script included)

205 views
Skip to first unread message

SHak

unread,
Jan 13, 2014, 11:00:55 AM1/13/14
to orient-...@googlegroups.com
Hi,

We are currently in the process of evaluating OrientDB and Neo4J for graph database, my task is to prototype the OrientDB while a colleague implements a Neo4J prototype.  I'm using the 1.6.3 and have read the entire website + book, running everything on the command line and using SQL syntax to execute queries.

I'm implementing an example of "label" translations.  I cannot find a similar example out there to help get it right.  Hoping you could put me on the right track.  Here is the scenario with complete data script.

Setup: 

- I have 3 classes;  User, Language and Label.  
- Label "is" in a language
- Label "translatesTo" another Label.
- User "speaks" a language  (user is only there to complicate things because it has edges to language) 

The goal is to start with a label, "Hello" and should find the label "Marhaba".  The label 'Marhaba' is not directly linked to "Hello" but in its graph and "Marhaba" is the only one that is linked to language arabic.

Basically, the goal is to start with "Hello", find any other label in its graph that "is" "Language.name" = 'arabic'.

Questions
- how to write such query? is it even possible?
- I'm worried about performance? as it has to start with labels, find all the labels in it's graph and then filter by language.  Lets not forget that a label can be the same in multiple languages.

Future Questions :)
Assuming that the previous Question has a solution, I was thinking to also have a count on the edge/link between 2 labels to see how many times it was used. ( I assume it's possible to increment a counter on an edge)
- can the previous query be written to first filter by language and then "order by" the property count of the edge.

for example,  
"Hello" is linked to "Bonjour" (french)  count: 5
"Hello" is linked to "Salut" (also french) count: 10

When I search for a label in the network of "Hello" in French, I would like to order by the count of the edge.  Is that possible?

Thank you in advance. 

======================================================================
#drop database if already exists
drop database remote:/locahost/touca root [enter password];

#create touca database
create database remote:localhost/touca root [enter password] local;

#create classes/Vectors
create class User extends V;
create class Language extends V;
create class Label extends V;

#create Links/Edges
create class speaks extends E;
create class is extends E;
create class translatesTo extends E;

#create User Data
create vertex User set name = 'Luca';
create vertex User set name = 'Joe';

#create Language Data
create vertex Language set name = 'En-uk';
create vertex Language set name = 'En-us';
create vertex Language set name = 'Fr-fr';
create vertex Language set name = 'Ru-ru';
create vertex Language set name = 'Ar-sy';

#create Label Data
create vertex Label set name = 'Hello';
create vertex Label set name = 'Salut';
create vertex Label set name = 'Good day';
create vertex Label set name = 'Bonjour';
create vertex Label set name = 'Hallo';
create vertex Label set name = 'Marhaba';


#create the links between the User->Language
create edge speaks from (select from User where name = 'Luca') to (select from Language where name = 'En-uk');
create edge speaks from (select from User where name = 'Luca') to (select from Language where name = 'En-us');
create edge speaks from (select from User where name = 'Luca') to (select from Language where name = 'Fr-fr');
create edge speaks from (select from User where name = 'Luca') to (select from Language where name = 'Ar-sy');

create edge speaks from (select from User where name = 'Joe') to (select from Language where name = 'En-uk');
create edge speaks from (select from User where name = 'Joe') to (select from Language where name = 'En-us');

#create the links between label and language
create edge is from (select from Label where name = 'Hello') to (select from Language where name = 'En-uk');
create edge is from (select from Label where name = 'Hello') to (select from Language where name = 'En-us');
create edge is from (select from Label where name = 'Salut') to (select from Language where name = 'Fr-fr');
create edge is from (select from Label where name = 'Good day') to (select from Language where name = 'En-uk');
create edge is from (select from Label where name = 'Bonjour') to (select from Language where name = 'Fr-fr');
create edge is from (select from Label where name = 'Hallo') to (select from Language where name = 'Ru-ru');
create edge is from (select from Label where name = 'Marhaba') to (select from Language where name = 'Ar-sy');

#create the links between labels
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Good day');
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Hallo');
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Salut');
create edge translatesTo from (select from Label where name = 'Good day') to (select from Label where name = 'Bonjour');
create edge translatesTo from (select from Label where name = 'Bonjour') to (select from Label where name = 'Marhaba');
create edge translatesTo from (select from Label where name = 'Hallo') to (select from Label where name = 'Marhaba');
=======================================================================================

SHak

unread,
Jan 14, 2014, 3:04:39 AM1/14/14
to orient-...@googlegroups.com
So, I made some progress.  The following query returns what I need.  What do you think?  can it be better?

SELECT FROM (TRAVERSE * FROM (SELECT FROM label WHERE name like '%Hello%') while @class = 'Label') where out_is.name ='Ar-sy';


SHak

unread,
Jan 14, 2014, 3:50:03 AM1/14/14
to orient-...@googlegroups.com
until I hear back if this is the correct approach, I'm following on my 2nd problem which is to order by the Edge property.

I changed these 2 edges to have a count property

create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Good day') SET count=10;
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Hallo') SET count=5;

the problem I see is that my previous query filters to @class='Label' and the edge properties are of class='translateTo' and they are on a separate record when I run this query.

TRAVERSE * FROM (SELECT FROM label WHERE name like '%Hello%') while @class = 'Label'

I thought that maybe expand on this query might help but it's too late because of filter of @class, any suggestions?



Artem Orobets

unread,
Jan 14, 2014, 8:20:36 AM1/14/14
to orient-...@googlegroups.com
I think this one would be better
select from (traverse out('translatesTo') from (select from Label where name = 'Hello')) where out('is').name contains 'Ar-sy'

Traversing only by out('translatesTo') is faster than traversing and than filtering. Moreover if you have another type of edges between labels they also will be traversed.

Best regards,
Artem Orobets

Orient Technologies

the Company behind OrientDB



2014/1/14 SHak <safwa...@gmail.com>
So, I made some progress.  The following query returns what I need.  What do you think?  can it be better?

SELECT FROM (TRAVERSE * FROM (SELECT FROM label WHERE name like '%Hello%') while @class = 'Label') where out_is.name ='Ar-sy';


--
 
---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Artem Orobets

unread,
Jan 14, 2014, 8:46:43 AM1/14/14
to orient-...@googlegroups.com
I'm not sure that understand usecase for second query. Do you want to get count only of first level translation or all transitive translation of some word (Like "Hello" -> "Marhaba")?

For the first case you can simply use query like:
select outV().name, inV().name, count from translatesTo where outV().name = 'Hello'

For the second one I don't understand how do you want to calculate count.

Best regards,
Artem Orobets

Orient Technologies

the Company behind OrientDB



2014/1/14 SHak <safwa...@gmail.com>
until I hear back if this is the correct approach, I'm following on my 2nd problem which is to order by the Edge property.

SHak

unread,
Jan 14, 2014, 8:50:22 AM1/14/14
to orient-...@googlegroups.com
Thanks. I see now the advantage of out('translatesTo').  But I don't understand the purpose of "contains". I tried with out('is').name = 'Ar-sy' and it doesn't return the data.  I assume because out('is') is a collection but I thought that = would work on anything including collections.

Artem Orobets

unread,
Jan 14, 2014, 9:03:53 AM1/14/14
to orient-...@googlegroups.com
No it is not. I think it could be a little bit confusing if equality operator has contains behavior for collections.

SHak

unread,
Jan 14, 2014, 9:06:31 AM1/14/14
to orient-...@googlegroups.com
For those that have a direct edge, I would like to have their counts, for the others it could stay null or 0.

If the data is structured as in the example below, Allo and Salut are a direct link while Bonjour is indirect via italian ciao.

"Hello"->"Allo" (5 times)
"Hello"->"Salut" (15 times)
"Hello"->"Ciao"
"Ciao"->"Bonjour"

if I query for french translations, it would be nice to order the result as in example below, so the most common translation is on top.  As edge/counts are added between "Hello" and "Bonjour", then we can show the most used translation on top.
"Salut" (15 times)
"Allo" (5 times)
"Bonjour" (null or 0 times)

SHak

unread,
Jan 14, 2014, 9:08:34 AM1/14/14
to orient-...@googlegroups.com
Got it. thanks. my misunderstanding when I read the documentation and saw "any", I assumed any type of result.  Thanks for the clarification.

Artem Orobets

unread,
Jan 14, 2014, 1:10:11 PM1/14/14
to orient-...@googlegroups.com
The idea is to find edges to the translations instead of translations itself, then you can take a projection of a result into translation_name, count and depth of projection. Then during processing of query result you can take count value if depth = 1 or 0 otherwise.

With growing of complexity of traversal logic it's become more and more complicated to write a proper SQL query. Possibly make sense to take a look at gremlin queries. This query language is more flexible and allow to implement some queries in more simple way. However it has its cost of performance, some gremlin queries can work much longer then its SQL variant.

SHak

unread,
Jan 15, 2014, 2:49:56 AM1/15/14
to orient-...@googlegroups.com
Thanks Artem.  I'm having trouble running gremlin command line so I'll try that as soon as the problem is solved. I created a POST for this as well  https://groups.google.com/forum/#!topic/orient-database/c6Mob7K0CMs

SHak

unread,
Jan 16, 2014, 8:52:02 AM1/16/14
to orient-...@googlegroups.com
Hi Artem, I have a great idea to solve my issue that I think will work but I need your expertise.

I was wondering, I would like to avoid a complicated traversal solution as you mention with gremlin that could be a lot more expensive (I still cannot get gremlin working out of the box).  But would like to continue using SQL.  So I've been thinking...could the following be done?

Select using the query you provided before excluding the direct connections from the result.  then do another select that goes only 1 depth and get the count on the edges and finally do a union of both and order by count.  Not sure what the syntax would be so I'll pseudo code it and maybe you can let me know if it's possible?

I understand that it'll run 2 queries but probably not expensive overall especially that the 2nd query is direct links only.

Select name, count from
UNION
{
    select name, null as count from (traverse out('translatesTo') from (select from Label where name = 'Hello')) where out('is').name contains 'Ar-sy' EXCLUDE FIRST LEVEL
}
WITH
{
    select name,count of edge from (traverse out('translatesTo') from (select from Label where name = 'Hello')) where out('is').name contains 'Ar-sy' LIMIT TO 1 LEVEL of depth
}
Order by count desc 

(I regret having asked both questions here, because if we can get this working, I'm sure others can benefit. if we can get it working, I'll repost it with answer in case someone else is interested)

SHak

unread,
Jan 16, 2014, 11:47:11 AM1/16/14
to orient-...@googlegroups.com
So I'm trying to figure an approach to my previous reply and I hit another problem. As soon as I give my edge a property, the query that I'm using stops working and won't return anything that has properties, seems that the in/out is now with the edge and not between the labels and that of course affects the result greatly.  

Before I add the property, the result is 3 records after I add the properties to my edge, it's only one record.  When I do select from label, I see that "in_translateTo" is another class.

How can I rewrite this query to get the proper 3 records?

Please use setup below to reproduce.

Query
=====
Select from (traverse out('translatesTo') from (select from Label where name = 'Hello')) where out('is').name contains 'Fr-fr'


setup
===========================================================
#drop database if already exists
drop database remote:/locahost/touca root pass;

#create touca database
create database remote:localhost/touca root pass local;

#create classes/Vectors
create class User extends V;
create class Language extends V;
create class Label extends V;

#create Links/Edges
create class speaks extends E;
create class is extends E;
create class translatesTo extends E;

#create User Data
create vertex User set name = 'Luca';
create vertex User set name = 'Joe';

#create Language Data
create vertex Language set name = 'En-uk';
create vertex Language set name = 'En-us';
create vertex Language set name = 'Fr-fr';
create vertex Language set name = 'Ru-ru';
create vertex Language set name = 'Ar-sy';

#create Label Data
create vertex Label set name = 'Hello';
create vertex Label set name = 'Salut';
create vertex Label set name = 'Ciao';
create vertex Label set name = 'Good day';
create vertex Label set name = 'Bonjour';
create vertex Label set name = 'Hallo';
create vertex Label set name = 'Marhaba';

#index
create property speaks.out LINK;
create property speaks.in LINK;
CREATE INDEX unique_speaks ON speaks (in, out) UNIQUE;

#create property translatesTo.out LINK;
#create property translatesTo.in LINK;
#CREATE INDEX unique_translatesTo ON translatesTo (in, out) UNIQUE;


#create the links between the User->Language
create edge speaks from (select from User where name = 'Luca') to (select from Language where name = 'En-uk');
create edge speaks from (select from User where name = 'Luca') to (select from Language where name = 'En-us');
create edge speaks from (select from User where name = 'Luca') to (select from Language where name = 'Fr-fr');
create edge speaks from (select from User where name = 'Luca') to (select from Language where name = 'Ar-sy');

create edge speaks from (select from User where name = 'Joe') to (select from Language where name = 'En-uk');
create edge speaks from (select from User where name = 'Joe') to (select from Language where name = 'En-us');

#create the links between label and language
create edge is from (select from Label where name = 'Hello') to (select from Language where name = 'En-uk');
create edge is from (select from Label where name = 'Hello') to (select from Language where name = 'En-us');
create edge is from (select from Label where name = 'Salut') to (select from Language where name = 'Fr-fr');
create edge is from (select from Label where name = 'Ciao') to (select from Language where name = 'Fr-fr');
create edge is from (select from Label where name = 'Good day') to (select from Language where name = 'En-uk');
create edge is from (select from Label where name = 'Bonjour') to (select from Language where name = 'Fr-fr');
create edge is from (select from Label where name = 'Hallo') to (select from Language where name = 'Ru-ru');
create edge is from (select from Label where name = 'Marhaba') to (select from Language where name = 'Ar-sy');

#create the links between labels
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Good day');
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Hallo');
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Salut') SET ncount = 5;
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Ciao') SET ncount = 10;
create edge translatesTo from (select from Label where name = 'Good day') to (select from Label where name = 'Bonjour');
create edge translatesTo from (select from Label where name = 'Bonjour') to (select from Label where name = 'Marhaba');
create edge translatesTo from (select from Label where name = 'Hallo') to (select from Label where name = 'Marhaba');


Luca Garulli

unread,
Jan 17, 2014, 9:40:35 AM1/17/14
to orient-database
Hi,
If I understood well with your domain to retrieve the most ranked translation you could do:

select from ( select expand( out('translatesTo') ) from Label where name = 'Hello' ) order by out_translatesTo.size() desc

To look to the direct translation in French:

select from ( select expand( out('translatesTo') ) from Label where name = 'Hello' ) where out('is') in 'Fr-fr' order by out_translatesTo.size() desc

Now to pass by any other words you could do:

select *, $depth from (
  traverse both('translatesTo') from (
    select from Label where name = 'Hello' )
  while $depth < 4 )
where $depth > 0 and out('is').name IN 'Fr-fr' order by $depth

Note the order by: the shortest path is on the top.

Or you could start playing on the concept of "shortestPath" to find the shortest path between the word "Hello" and the "French" language:

select shortestPath( $from, $to ).asString() let $from = ( select from Label where name = 'Hello' ), $to = (select from Language where name = 'Fr-fr') )

Result:

[v(Label)[#13:0], v(Label)[#13:2], v(Language)[#12:2]]


Lvc@


SHak

unread,
Jan 17, 2014, 10:15:51 AM1/17/14
to orient-...@googlegroups.com
Hi Luca,

select *, $depth from (
  traverse both('translatesTo') from (
    select from Label where name = 'Hello' )
  while $depth < 4 )
where $depth > 0 and out('is').name IN 'Fr-fr' order by $depth

this query gives me half of what I need but I'm concerned about a couple of things.  First, Lets say that I have 40 languages. with 40 labels.  And each one is only connected from one to one to one.  This is the worst case scenario of course.
i.e.  Hello->Bonjour->Ciao->Hallo->..->40th word.

Question 1: Will $depth < 4 not be an issue then?
Question 2: How does this query differ in processing/performance from this query suggested by Artem? select from (traverse out('translatesTo') from (select from Label where name = 'Hello')) where out('is').name contains 'Ar-sy'

The other half that I'm missing from this query is how to get the weight for those that are directly connected.  As explained in earlier post, if Hello is directly translated to French Salut and has been accepted by users 15 times (set as count on the edge). And the french Bonjour has only been translated via italian ciao then I still want that word but the weight/count is 0 or null in this case.  it looks like this.

Hello->Salut (count on edge is 15)
Hello->Ciao
Ciao->Bonjour

As you can see "Hello" is not directly connected to Bonjour so it naturally has no count, so we can assume it's 0 or null if you wish.  While Hello is directly linked to Salut and has been used 15 times.

so when we run the suggested query above, I would like to have result such as this, (ordered by count desc)
Label     Count
Salut      15    (the count here is 15 because Hello is linked to Salut directly with a count on the edge=15)
Bonjour   0 (or null, the counter here is 0/null because Hello was never linked directly and this is retrieved via another vertex linked to hello)

Finally, here is my approach to it as in my previous comment to Artem, I just don't know the proper syntax to do this, there might be a better approach because my "solution" requires 2 queries and a union.  https://groups.google.com/d/msg/orient-database/uhlXh8_Bo2U/eR4ZDoQXo3cJ

Thanks

SHak

unread,
Jan 17, 2014, 10:19:44 AM1/17/14
to orient-...@googlegroups.com
Luca,

I've adjusted my script to match the request above. it has the counts on 2 of the edges.

create edge is from (select from Label where name = 'Ciao') to (select from Language where name = 'It-it');
create edge is from (select from Label where name = 'Good day') to (select from Language where name = 'En-uk');
create edge is from (select from Label where name = 'Bonjour') to (select from Language where name = 'Fr-fr');
create edge is from (select from Label where name = 'Hallo') to (select from Language where name = 'Ru-ru');
create edge is from (select from Label where name = 'Marhaba') to (select from Language where name = 'Ar-sy');

#create the links between labels
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Good day');
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Hallo');
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Salut') SET ncount = 5;
create edge translatesTo from (select from Label where name = 'Hello') to (select from Label where name = 'Ciao') SET ncount = 10;
create edge translatesTo from (select from Label where name = 'Good day') to (select from Label where name = 'Bonjour');
create edge translatesTo from (select from Label where name = 'Bonjour') to (select from Label where name = 'Marhaba');
create edge translatesTo from (select from Label where name = 'Hallo') to (select from Label where name = 'Marhaba');



Reply all
Reply to author
Forward
0 new messages