help getting started from a csv file on Mac

57 views
Skip to first unread message

Samantha Zeitlin

unread,
Apr 11, 2014, 2:09:57 PM4/11/14
to ne...@googlegroups.com

I'm trying to use the batch-importer for a single, csv file (2086 rows) on Mac. I'm new to neo4j. Sat in on an "intro" session at Hackbright not too long ago, went through some of the tutorials, and actually interacting with the database looks pretty straightforward once it has been built. 

But I'm stuck on the very first step: trying to build one and look at it. 

I'm just blindly following instructions, I don't really know what I'm doing here. Here follows a list of what I did:

At first, the old version of batch-importer seemed (?) to work. I followed the instructions on Max Demarzi's blog post, i.e. http://maxdemarzi.com/2012/02/28/batch-importer-part-1/

my csv file looks something like this :

Unnamed: 0      people users                refs
0       0c8e29fd23c6    a974d06d9187    
1       0c8e29fd23c6    c798a44cc829    0012d39f387d
2       0c8e29fd23c6    7d54ec992072    0134e76e7738
3       0c8e29fd23c6    52380be9221f    0153923edd56
...


(these would all be nodes, I want to add the relationships after import using something like this: http://stackoverflow.com/questions/13823988/batch-insertion-with-neo4j)


So first I ran this command: java -server -Xmx4G -jar target/batch-import-jar-with-dependencies.jar target/db myfile.csv

and it looked like it ran correctly, because I got things that look like this:

Importing 34400 Nodes took 0 seconds 

Total import time: 2 seconds

and a target folder full of files:

bash-3.2$ ls
active_tx_log neostore.propertystore.db.index.keys
messages.log neostore.propertystore.db.strings
neostore.id neostore.relationshipstore.db
neostore.nodestore.db neostore.relationshipstore.db.id
neostore.nodestore.db.id neostore.relationshiptypestore.db
neostore.propertystore.db neostore.relationshiptypestore.db.id
neostore.propertystore.db.arrays neostore.relationshiptypestore.db.names
neostore.propertystore.db.id nioneo_logical.log.active
neostore.propertystore.db.index tm_tx_log.1

So I thought that looked ok. Then, to start the server, I typed:

neo4j start

and in the browser I went to http://localhost:7474

And I can see the Neo4j dashboard, etc. but I didn't see anything under the data tab. 

???

So although there seem to be .db files, there is no graph.db file, and I'm not sure how to do anything if I can't actually view my data after it has been imported? I tried viewing the .db files themselves, and they're not human-readable (binary hashes?), so I don't know if that step actually worked correctly? 


So I googled around some more and thought maybe the problem is that my database isn't connected to the server. So next I tried: 

neo4j-shell

and it said:

NOTE: Remote Neo4j graph database service 'shell' at port 1337

and if I go to http://localhost:1337, the page is blank, which makes sense. 

So then I tried:

neo4j-shell -path db

but it said:

ERROR: java.io.FileNotFoundException: /db/messages.log (No such file or directory) 

Which is clearly confusing me, since if you look above, there is indeed a messages.log file in the /db folder. And when I view it, there is a lot of stuff in there, and no big warning or error labels that I can see (?).  

But then I realized I'm using the new (2.0.1) community version of neo4j, and wondered if that might be part of the problem? So, I tried following the directions to use the batch-importer-20 version, but it's giving me different java errors. Someone else on the google group seemed to have a similar problem: 

Exception in thread "main" java.lang.NoClassDefFoundError

and it looked like you advised them to use import-mvn.sh instead of import.sh, so I tried that, but maven is saying there are no projects to compile. Which makes sense, I think, I just don't know what to do about it? 

So...

I really just need step-by-step instructions, and I can't find them anywhere. Or at least, the ones I found that looked simple, are either out of date (like @maxdemarzi's site) and/or didn't work for me, maybe because I'm missing some (probably really simple) steps/details in how to execute them? 

I'm thinking if I can get this to work, I might write it up in a blog post, since I don't think I'm the only one who wants to use neo4j this way (?).

Thanks for any help you have time to provide. 

Sam

Michael Hunger

unread,
Apr 11, 2014, 7:07:03 PM4/11/14
to ne...@googlegroups.com
Hi Samantha,

glad that you made it so far.

You imported the data into the "target/db" directory. Please note that this directory as it resides in "target" might be deleted if you run "mvn clean" which cleans out the whole "target" directory.
So better specify a different directory to import into like "graph.db" (without the target/ )

To use it with the server you have to copy the _contents_ of that directory, into /path/to/neo4j-server/data/graph.db
eg.

cp -r target/db/* /path/to/neo4j-server/data/graph.db

And only after that start the server

If you use neo4j-shell without a server you also have to provide the path

neo4j-shell -path target/db

Please try, that and it would be awesome if you would write it up in a blog post.

P.S. Did you see the "binary download" instructions? Should be much easier:


Cheers,

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Samantha Zeitlin

unread,
Apr 14, 2014, 11:28:17 AM4/14/14
to ne...@googlegroups.com
ok I had missed the instruction to copy the db into graph.db and run neo4j from inside the bin folder… pretty confusing when there are so many things with the same names. 

I was able to get it to work with a file that had commas in it (I used the wrong csv file by accident). Still can't figure out how to add relationships if I don't know what they are already? 

And when I try to go back and run the import script again with a tab-delimited "csv" file, I get a java error again?

~/Desktop/neo4j-community-2.0.1/batch_importer_20$ ./import.sh test.db spaced_joined.csv 
Usage: Importer data/dir nodes.csv relationships.csv [node_index node-index-name fulltext|exact nodes_index.csv rel_index rel-index-name fulltext|exact rels_index.csv ....]
Using: Importer batch.properties test.db spaced_joined.csv rels.csv

Writing Configuration File to batch.properties

Total import time: 1 seconds 
Exception in thread "main" java.lang.IllegalArgumentException: Unknown Type  0
at org.neo4j.batchimport.importer.Type.fromString(Type.java:172)
at org.neo4j.batchimport.importer.AbstractLineData.createHeaders(AbstractLineData.java:46)
at org.neo4j.batchimport.importer.CsvLineData.<init>(CsvLineData.java:16)
at org.neo4j.batchimport.Importer.createLineData(Importer.java:173)
at org.neo4j.batchimport.Importer.importNodes(Importer.java:93)
at org.neo4j.batchimport.Importer.doImport(Importer.java:228)
at org.neo4j.batchimport.Importer.main(Importer.java:83)

In spite of this error and no message about nodes imported, I get a folder full of files. But when I look at it, there's no data. 

Michael Hunger

unread,
Apr 14, 2014, 11:40:34 AM4/14/14
to ne...@googlegroups.com
What do your files look like?

Michael

Samantha Zeitlin

unread,
Apr 14, 2014, 3:37:09 PM4/14/14
to ne...@googlegroups.com
ok, I don't know what happened before, my file looked screwy so I re-exported my file and re-imported it, and this time it worked, no errors. 

Now I just need to figure out how to get the relationships… 

In the meantime, I'll write down what I did before I forget… 

many thanks for your help! :)

sam

Michael Hunger

unread,
Apr 14, 2014, 4:13:50 PM4/14/14
to ne...@googlegroups.com
For the relationship you have 3 options:

1. for a fresh db. just refer to the row number of the node file (-1) as you start with node number 0
2. you can index the nodes on a property with an index (which you have to configure, e.g. name:string:User and then in the rels file in the first two columns use name:String:User as header too (then it will try to find the field name in the user index)
3. You control the node-ids from the outside, then you do id:id in the first column of the nodes providing the id's and for the relationships start:id and end:id for the first two columns of the relationship-file

HTH

Michael

Samantha Zeitlin

unread,
Apr 23, 2014, 11:32:12 AM4/23/14
to ne...@googlegroups.com
I'm not sure I understand your suggestions, and/or I think I need to do the step before that, i.e. the relationships are not pre-determined in my file. I think I have to identify them? 

Hopefully neo4j will at least make it easier to view the data and do additional measurements once the webbing of connections is created?

Basically, there are two columns in the original file. One column lists primary nodes, the other lists related nodes. Sounds simple enough to just use index.

However, I think some of the primary nodes are also connected to other primary nodes. That is the interesting part of the data, and why I think this is a graph database problem rather than an SQL type problem.

So my data would look something like this, if it were human-readable:

person    recommender
Bob                Andy
Nancy            Andy
Jim                Bob
Harry            Jim
Melissa        Carrie

What I'm trying to understand is: do I have to find and hard-code those relationships ahead of time?

I'm asking because I'm trying to think about whether it makes more sense to loop through the data set and add a column to create a count, or list pairs, or if there is some better way to denote which nodes are connected and to which other nodes? Or if it makes more sense to melt and cast the data set into separate columns for each user...?

How do you typically handle this? Is it easier to write something in Cypher that will identify and create the relationships automatically, i.e. if the same node is listed in more than one place in both columns?

Sam
Reply all
Reply to author
Forward
0 new messages