REST::Neo4p -- Creating unique nodes. (Perl)

Sunit Jain

unread,

Mar 18, 2015, 5:30:15 PM3/18/15

to ne...@googlegroups.com

First, congratulations on creating such a great perl driver for Neo4j. I really appreciate the work you must have put into it.

I've been trying to use this driver to create a database for our meta*omic data. I was successfully able to put together some perl code by following some slides, the neo4j blog post about this driver and the MetaCPAN description. However I'm getting stuck at a point where I'm no longer sure what's going on. I'm hoping you might be able to help.

As a side note, the example on the neo4j blog seemed very limited and about 2yr old, is there a more recent version somewhere? Maybe one with best practices? If not, I'd be happy to start one explaining what I did for my current project, once I have at least one successful run. It won't be as insightful, but it'll be something.

Goal:

Create unique Taxa nodes, have the gene locus that belong to the Taxa relate to it with an "IN_ORGANISM" relationship:

(Taxa)<-[: IN_ORGANISM]-(Locus)

More details can be found in createDB.pl (lines: 326-352), here

Issue:

Here is the perl snippet of my code to create unique 'Taxa' nodes:

Perl snippet to create unique relations to Taxa:

When I run this script, it creates the exact same taxa node 94 times! I did a quick grep in my CSV to find that there were 94 instances of that taxa. So, the script essentially created a new node each time it encountered a species. I also created some scaffold, locii, COG, PFam and Project nodes much the same way but only unique nodes were created in all the other instances. The only difference was that the property "id" was "$species" which is a text value with spaces in case of Taxa but for all others it was an alphanumeric without spaces, but I don't see how this could affect the outcome.

I apologize for the lengthy email.

================

Linux RHEL Server 6.5

Perl 5.18

Neo4j 2.1.7

Java 1.7

================

--

Sunit Jain

Research Computing Specialist -- Bioinformatics

Michigan Geomicrobiology Lab

Dept. of Earth & Environmental Sciences,

University of Michigan,

Ann Arbor, MI, USA.

web: www.sunitjain.com

meet: www.sunitjain.com/contact

Mark Jensen

unread,

Mar 18, 2015, 10:10:45 PM3/18/15

to ne...@googlegroups.com

Thanks Sunit --
I'll think your problem is the difference highlighted in the code below. You're looking for the species with the key 'name', but adding to the index with key 'id'.

You may find the $idx->create_unique() method helpful too.
MAJ

if ($PhyloDist{$gene}{"DOMAIN"}) {
  my $species=$PhyloDist{$gene}{"SPECIES"};
  ($taxa_nodes{$gene})= $idx->find_entries(name=>$species);
  unless ($taxa_nodes{$gene}) {
    $taxa_nodes{$gene}=REST::Neo4p::Node->new({id=>$PhyloDist{$gene}{"SPECIES"}});
    $taxa_nodes{$gene}->set_labels("Taxa");
    foreach (keys %{$PhyloDist{$gene}}){
      next if $_ eq "SPECIES";
      next if $_ eq "PERCENT";
      my $value=lc($PhyloDist{$gene}{$_});
      my $key=lc($_);
      $taxa_nodes{$gene}->set_property({$key=>$value});
    }
    $idx->add_entry($taxa_nodes{$gene}, id=>$species);
  }
...
}

Sunit Jain

unread,

Mar 19, 2015, 8:24:07 AM3/19/15

to ne...@googlegroups.com

Ahh! That's my bad! Sorry!

I corrected it and ran it again. But I still get some taxa that are the same but have multiple nodes. The updated code is available here (lines: 368-393). Here is the snippet:

Also, what's the difference between how I'm creating nodes and using the create_unique function? Aside from maybe saving me a few lines?

I was trying to use the function, but wasn't able to figure out what the first and second arguments were. I couldn't find it on the blog I mentioned above and in your slides they were both "name=>$pkg" or similar.

--

Sunit Jain

Research Computing Specialist -- Bioinformatics

Michigan Geomicrobiology Lab

Dept. of Earth & Environmental Sciences,

University of Michigan,

Ann Arbor, MI, USA.

email: sun...@umich.edu

web: www.sunitjain.com

meet: www.sunitjain.com/contact

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/QXep2b3ncMs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark Jensen

unread,

Mar 19, 2015, 4:38:39 PM3/19/15

to ne...@googlegroups.com

The blog is very old, it needs a complete update; unfortunate I have very few tuits these days...

create_unique() lets the neo4j server do the lookup and creates the node or relationship if it doesn't exist in the index (it wraps a REST call that does this natively)-- so your code doesn't have to spend cycles on this.

Should work like so:

$new_or_existing_node = $node_idx->create_unique( $indexed_property_name => $value_for_new_node,$node_as_hash)
$new_or_existing_rel = $rel_idx->create_unique($indexed_property_name => $value_for_new_rel,$from_node => $to_node,$relationship_type)

Mark Jensen

unread,

Mar 21, 2015, 11:24:42 AM3/21/15

to ne...@googlegroups.com

create_unique() also should add the entry to the index automatically if it created a new node. I wonder if your add_entries line 382 could be executing more often than you want- just a thought

Reply all

Reply to author

Forward