CSV import : Cannot merge node using null property value

2,798 views
Skip to first unread message

Jean Villedieu

unread,
Nov 4, 2014, 4:39:44 AM11/4/14
to ne...@googlegroups.com
Hi,

I have already experimented with the CSV import and now I'm trying to use it with the Crunchbase database.
Thanks to Michael Hunger's posts, I have managed to get started but I keep getting this message :
Cannot merge node using null property value for founded_year (Failure when processing URL 'file:c:/Users/Jean/Downloads/CompaniesTEST.csv' on line 4 (which is the last row in the file). No rows seem to have been committed. Note that this information might not be accurate.)
I think it has to do with null values.

Here is what I used to do the importation :
create constraint on (a:Company) assert a.permalink is unique
create constraint on (b:Category) assert b.category_list is unique
create constraint on (c:MARKET) assert c.market is unique
create constraint on (d:STATUS) assert d.status is unique
create constraint on (e:COUNTRY) assert e.country_code is unique
create constraint on (f:STATE) assert f.state_code is unique
create constraint on (g:REGION) assert g.region is unique
create constraint on (h:CITY) assert h.city is unique
CREATE INDEX ON :Company(permalink)
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:c:/Users/Jean/Downloads/CompaniesTEST.csv" AS line
MERGE (a:COMPANY {permalink: line.permalink, funding_total: line.funding_total_usd, funding_rounds: toInt(line.funding_rounds), founded_at: line.founded_at, founded_month: line.founded_month, founded_quarter: line.founded_quarter, founded_year: toInt(line.founded_year), first_funding_at: line.first_funding_at, last_funding_at: line.last_funding_at})
MERGE (b:CATEGORY {name: line.category_list})
MERGE (c:MARKET {name: line.market})
MERGE (d:STATUS {name: line.status})
MERGE (e:COUNTRY {name: line.country_code})
MERGE (f:STATE {name: line.state_code})
MERGE (g:REGION {name: line.region})
MERGE (h:CITY {name: line.city})
CREATE (a)-[:HAS_CATEGORY]->(b)
CREATE (a)-[:HAS_MARKET]->(c)
CREATE (a)-[:HAS_STATUS]->(d)
CREATE (a)-[:HAS_COUNTRY]->(e)
CREATE (a)-[:HAS_STATE]->(f)
CREATE (a)-[:HAS_REGION]->(g)
CREATE (a)-[:HAS_CITY]->(h)

FYI when I create the COMPANY nodes with permalink as a single property, it works. What am I doing wrong?
The csv file I'm using can be found here : https://gist.github.com/jvilledieu/c3afe5bc21da28880a30

Thanks for your help!

Mark Needham

unread,
Nov 4, 2014, 4:53:20 AM11/4/14
to ne...@googlegroups.com
Try this:

LOAD CSV WITH HEADERS FROM "file:c:/Users/Jean/Downloads/CompaniesTEST.csv" AS line
WITH line WHERE line.founded_year is null
RETURN line

Does it return any rows? 

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jean Villedieu

unread,
Nov 4, 2014, 5:10:05 AM11/4/14
to ne...@googlegroups.com
Thanks  Mark. It returns nothing : Returned 0 rows in 276 ms

Mark Needham

unread,
Nov 4, 2014, 5:11:34 AM11/4/14
to ne...@googlegroups.com
Hmm ok, what about:

LOAD CSV WITH HEADERS FROM "file:c:/Users/Jean/Downloads/CompaniesTEST.csv" AS line
WITH line WHERE line.founded_year = ""
RETURN line

Jean Villedieu

unread,
Nov 4, 2014, 5:17:58 AM11/4/14
to ne...@googlegroups.com
I get :
cityLos Angeles
name&TV Communications
category_list|Games|
founded_at
first_funding_at04/06/2010
permalink/organization/tv-communications
marketGames
founded_quarter
founded_year
country_codeUSA
homepage_urlhttp://enjoyandtv.com
funding_total_usd4�000�000
founded_month
funding_rounds2
statusoperating
regionLos Angeles
state_codeCA
last_funding_at23/09/2010

Mark Needham

unread,
Nov 4, 2014, 5:21:31 AM11/4/14
to ne...@googlegroups.com
Ah ok, so I think when you call toint on that row it's becoming null? 

$ return toint("");
+-----------+
| toint("") |
+-----------+
| <null>    |
+-----------+
1 row
1296 ms

But in fact I'd suggest that what makes a company unique is the permalink so you don't actually need to have founded_year in the MERGE statement anyway. 


USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:c:/Users/Jean/Downloads/CompaniesTEST.csv" AS line
MERGE (a:COMPANY {permalink: line.permalink})
ON CREATE SET a.
funding_total = line.funding_total_usd, a.funding_rounds = toInt(line.funding_rounds), a.founded_at =  line.founded_at, a.founded_month = line.founded_month, a.founded_quarter = line.founded_quarter, a.founded_year = toInt(line.founded_year), a.first_funding_at = line.first_funding_at, a.last_funding_at = line.last_funding_at
MERGE (b:CATEGORY {name: line.category_list})
MERGE (c:MARKET {name: line.market})
MERGE (d:STATUS {name: line.status})
MERGE (e:COUNTRY {name: line.country_code})
MERGE (f:STATE {name: line.state_code})
MERGE (g:REGION {name: line.region})
MERGE (h:CITY {name: line.city})
CREATE (a)-[:HAS_CATEGORY]->(b)
CREATE (a)-[:HAS_MARKET]->(c)
CREATE (a)-[:HAS_STATUS]->(d)
CREATE (a)-[:HAS_COUNTRY]->(e)
CREATE (a)-[:HAS_STATE]->(f)
CREATE (a)-[:HAS_REGION]->(g)
CREATE (a)-[:HAS_CITY]->(h)

Make sure you have an index on :COMPANY before running or it'll be slow!

CREATE INDEX ON :COMPANY(permalink)

Jean Villedieu

unread,
Nov 4, 2014, 6:07:54 AM11/4/14
to ne...@googlegroups.com
Looks like it is working :)
As usual, I'll do a blog post documenting all this. Thanks for your help Mark!

Mark Needham

unread,
Nov 4, 2014, 6:10:53 AM11/4/14
to ne...@googlegroups.com
Cool! 

I want to improve the documentation so it's more obvious what to do in the future. Am I right in assuming that this is the page in the docs that you'd likely read about MERGE? http://neo4j.com/docs/stable/query-merge.html

Jean Villedieu

unread,
Nov 4, 2014, 7:17:31 AM11/4/14
to ne...@googlegroups.com
That's the one!
Reply all
Reply to author
Forward
0 new messages