First, congratulations on creating such a great perl driver for Neo4j. I really appreciate the work you must have put into it.
I've been trying to use this driver to create a database for our meta*omic data. I was successfully able to put together some perl code by following some
slides, the neo4j
blog post about this driver and the
MetaCPAN description. However I'm getting stuck at a point where I'm no longer sure what's going on. I'm hoping you might be able to help.
As a side note, the example on the neo4j blog seemed very limited and about 2yr old, is there a more recent version somewhere? Maybe one with best practices? If not, I'd be happy to start one explaining what I did for my current project, once I have at least one successful run. It won't be as insightful, but it'll be something.
Goal:
Create unique Taxa nodes, have the gene locus that belong to the Taxa relate to it with an "IN_ORGANISM" relationship:
(Taxa)<-[: IN_ORGANISM]-(Locus)
More details can be found in createDB.pl (lines: 326-352),
here
Issue:
Here is the perl snippet of my code to create unique 'Taxa' nodes:
Perl snippet to create unique relations to Taxa:
When I run this script, it creates the exact same taxa node 94 times! I did a quick grep in my CSV to find that there were 94 instances of that taxa. So, the script essentially created a new node each time it encountered a species. I also created some scaffold, locii, COG, PFam and Project nodes much the same way but only unique nodes were created in all the other instances. The only difference was that the property "id" was "$species" which is a text value with spaces in case of Taxa but for all others it was an alphanumeric without spaces, but I don't see how this could affect the outcome.
I apologize for the lengthy email.
================
Linux RHEL Server 6.5
Perl 5.18
Neo4j 2.1.7
Java 1.7
================
--
Sunit Jain
Research Computing Specialist -- Bioinformatics
Michigan Geomicrobiology Lab
Dept. of Earth & Environmental Sciences,
University of Michigan,
Ann Arbor, MI, USA.