[Cypher] FullText Search Query to find all nodes where properties match and related notes property matches.

773 views
Skip to first unread message

Rory C

unread,
May 8, 2012, 2:54:13 PM5/8/12
to ne...@googlegroups.com
I am new to neo4j and cypher.
neo4j version: 1.6 community
OS: Solaris 10
jdk: 1.6.0_21 

I have two types of nodes: "Video" and "Tag". 

Video node properties:
type:String="video"
title:String
description:String
url:String

Tag node properties:
type:String= "tag"
name:String

indexes:
type: exact index on "type" property of all nodes
video: fulltext index on "name" and "description" of video nodes
tag: fulltext index on "name" of tag nodes

Videos can have many tags and and tags can be on many videos. I want to write a Cypher query that will produce one list of Videos that meet the following criteria:
1) All Videos where the title or the description properties meet the full text criteria for a keyword search
2) All Videos that are related to a tag where the name property of the tag meets the full text criteria for a keyword search

Sounds simple right?

here is as close as I can get: 
start tagged_videos=node:type(type="video"), matched_videos=node:video("name:keyword description:keyword"), tag=node:tag("name:keyword") match tagged_videos-[r]->t return tagged_videos, matched_videos

This returns 2 lists:
tagged_videos: all videos with a relationship to a tag that matches the keyword search
matchd_videos: all videos where either the name property or the description property match the keyword search

I'd like to get a single result list with all the videos in the above two lists. Is there an easy way to do that? like a UNION or something.

Thanks,
Rory

Noppanit

unread,
May 8, 2012, 5:32:01 PM5/8/12
to ne...@googlegroups.com
Hi Rory,

May be my answer won't actually answer your question. I wanted to do exactly the same thing you want to achieve, but I ended up using external tool such as, lucene or searchify to index the whole graph and use node_id as primary key. It'd be great to hear from somebody else. 

Best,
Toy.

Andres Taylor

unread,
May 9, 2012, 2:14:37 AM5/9/12
to ne...@googlegroups.com
On Tue, May 8, 2012 at 8:54 PM, Rory C <rory...@cheerzoo.com> wrote:
I'd like to get a single result list with all the videos in the above two lists. Is there an easy way to do that? like a UNION or something.

No, Cypher doesn't have anything similar to SQL's UNION. Should it?

Andrés

Luanne Coutinho

unread,
May 9, 2012, 2:18:25 AM5/9/12
to ne...@googlegroups.com
A UNION for Cypher would be great :-)

I have an example right now that I could have possibly combined into a single query doing some convoluted stuff, but chose to split it into 4 queries and combine the results manually. What I actually wanted to do was a union of all of them (with distinct results).

-Luanne

Andres Taylor

unread,
May 9, 2012, 3:12:31 AM5/9/12
to ne...@googlegroups.com
On Wed, May 9, 2012 at 8:18 AM, Luanne Coutinho <luanne....@gmail.com> wrote:
A UNION for Cypher would be great :-)

I have an example right now that I could have possibly combined into a single query doing some convoluted stuff, but chose to split it into 4 queries and combine the results manually. What I actually wanted to do was a union of all of them (with distinct results).

Good to know. Feedback like this is very important - thanks everyone for sharing this. Union shouldn't be too hard to do. I'll add it to the backlog.

Andrés 

Peter Neubauer

unread,
May 9, 2012, 3:44:37 AM5/9/12
to ne...@googlegroups.com
Maybe you could create an issue for it, along with an example on where
it would be useful?

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

Luanne Coutinho

unread,
May 9, 2012, 3:54:08 AM5/9/12
to ne...@googlegroups.com
I can do that later today.

Peter Neubauer

unread,
May 9, 2012, 3:56:33 AM5/9/12
to ne...@googlegroups.com
Thanks a lot Luanne!

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j


On Wed, May 9, 2012 at 9:54 AM, Luanne Coutinho

Michael Hunger

unread,
May 9, 2012, 3:58:33 AM5/9/12
to ne...@googlegroups.com
Rory,

can you try this one:

start tagged_videos=node:type(type="video"), 
        matched_videos=node:video("name:keyword description:keyword"), tag=node:tag("name:keyword") 

where (tagged_videos-[r]->t OR tagged_videos = matched_videos)
return tagged_videos

Rory C

unread,
May 9, 2012, 11:13:59 AM5/9/12
to ne...@googlegroups.com
Michael,

Well done! That's it! I had to tweak it a bit further, but it does pass my basic test cases. I will test more again tonight to verify. The final query I ended up with was:

start tagged_videos=node:type(type="video"), matched_videos=node:video("name:keyword description:keyword"), tag=node:tag("name:keyword") where ( tagged_videos-->tag OR tagged_videos = matched_videos) return distinct tagged_videos

Changes I made:
1) relationships expressed in the where clause do not allow for naming/referencing of the relationship, so I changed the "-[r]->" to "-->"
2) added "distinct" keyword to the return statement as the videos related to matching tags were showing twice in the results.

Thanks again to everyone for responding so quickly.

Regards,
Rory

Rory C

unread,
May 9, 2012, 2:21:47 PM5/9/12
to ne...@googlegroups.com
All,

I may have spoken a bit soon. One additional catch seems to be if any one of the start points results in 0 nodes, the whole query results in 0 results. 
That is to say: if we have video nodes matching "keyword" on properties "name" and "description", but no tag nodes match "keyword" on property "name" then the whole query returns 0 rather then returning the matching videos. To fix this I have made the following changes:

1) I have created a new fulltext index called "searchft"
       a) this index is on properties "name" and "description" for all video nodes 
       b) This index is on property "name" on the tag node.
2) I re-wrote the query to be as follows:

start video=node:type(type="video"), matched_node=node:searchft("name:block description:block"), tag=node:type(type="tag") where ((matched_node=tag AND video-->tag) OR video = matched_node) return distinct video

This seems to produce appropriate results in my 4 simple test cases:
1) "keyword" matches video.name or video.descrption but does NOT match on tag.name (returns videos where "keyword" is in either name or description)
2) "keyword" does NOT match on video.name or video.description but does match on tag.name (returns videos tagged with "keyword")
3) "keyword" matches video.name or video.descrption and matches on tag.name (returns union of results from 1 and 2 above)
4) "keyword" does NOT match on video.name or video.description and does NOT match on tag.name (0 results)

This seems to work for me and matches the cases that I have tested so far.

If I want to add another dimension (such as a video category) it would be adding the relevant properties of the new node type to the searchfs index and adding a new start point for all the new nodes. Then add the appropriate new filter to the where clause.

The question is: Is this the most efficient way to achieve appropriate results to satisfy the above 4 cases? My concern is that this solution won't scale. 

Thanks,
Rory

Michael Hunger

unread,
May 9, 2012, 3:27:03 PM5/9/12
to ne...@googlegroups.com
first a quick note - the index lookups and merges are not graphy query but just index lookups, so that's in the first place nothing that leverages the graph.

tagged_videos=node:type(type="video") 

should never result in 0 rows which would mean you have no videos at all.

only the matched_videos might be 0
and the tag can probably be 0 too

both of which would cause the total result to be 0 b/c it spans a cross product.

while thinking about this, you can also go back to your original query, could work if you use optional relationships

start tagged_videos=node:type(type="video"), matched_videos=node:video("name:keyword description:keyword"), tag=node:tag("name:keyword") 
match tagged_videos-[r?:TAG]->t 
where (r != null OR tagged_videos = matched_videos)
return tagged_videos

it might also be possible to use _with_ and _collect_ to aggregate the videos that match but I have to think about that.

Michael
Reply all
Reply to author
Forward
0 new messages