Using Cypher CSV import tool where data has commas

374 views
Skip to first unread message

Eric Olson

unread,
Jun 12, 2014, 4:13:02 PM6/12/14
to ne...@googlegroups.com
I am trying to import a large amount of data using Cypher's new LOAD CSV tool. The problem is that one of my properties will contain some arbitrary text which contains a lot of commas in itself. A basic picture of my data would look like:

{
 
"id": 1234,
 
"another_data": "data",
 
"text": "This can have commas, so this part never gets imported!"
}


My generated CSV file then is laid out like:

id,another_data,text
1234,data,This can have commas, so this part never gets imported!

and have even tried quoting it like

id,another_data,text
1234,data,"This can have commas, and this part still never gets imported!"



So, using the import tool like

USING PERIODIC COMMIT 50000
LOAD CSV WITH HEADERS FROM
'file:/mydata.txt' AS line
CREATE
(c:Comment { id: line.id, another_data: line.another_data, text: line.text })

Gives me a node resulting in

{
 
"id": 1234,
 
"another_data": "data",
 
"text": "This can have commas"
}

I only ever have one field that can have commas and have tried putting it at the end to see if it would just grab everything left over, but no luck. Ideas???

Wes Freeman

unread,
Jun 12, 2014, 4:26:49 PM6/12/14
to Neo4J
You want to "quote" your CSV. From wikipedia:
  • Fields containing a line-break, double-quote, and/or commas should be quoted. (If they are not, the file will likely be impossible to process correctly).

id,another_data,text
"1234","data","This can have commas, so this part never gets imported!"

Eric Olson

unread,
Jun 12, 2014, 4:31:57 PM6/12/14
to ne...@googlegroups.com
Oh, quote EVERYTHING! Haha, I see. Thanks! I just solved it differently though. I found out that Cypher allows you to set the delimiter, so I turned it into a tab-separated file. Now my import looks like:

USING PERIODIC COMMIT 50000
FIELDTERMINATOR
'\t'

LOAD CSV WITH HEADERS FROM
'file:/mydata.txt' AS line
CREATE
(c:Comment { id: line.id, another_data: line.another_data, text: line.text })

Viola!

Wes Freeman

unread,
Jun 12, 2014, 4:34:09 PM6/12/14
to Neo4J
That will work, until you have a tab in that arbitrary text. :) Then you'll need quotes again.

Wes

Wes Freeman

unread,
Jun 12, 2014, 4:35:37 PM6/12/14
to Neo4J
Most decent CSV libraries allow you to automatically quote their output, btw, so quotes are escaped properly, etc.

Eric Olson

unread,
Jun 12, 2014, 4:47:08 PM6/12/14
to ne...@googlegroups.com
Lol. True :) I will do it the right way next time. Thanks for the lesson :)
Reply all
Reply to author
Forward
0 new messages