Iterate through an array of strings in Cypher

1,137 views
Skip to first unread message

Yin Wang

unread,
Sep 11, 2013, 12:09:30 AM9/11/13
to ne...@googlegroups.com
Hi,

How can I iterate through an array of strings and do something using the string?

My use case is, there are nodes called companies which contains an array of employees' user id's. For example a company may look like {name: "fantastic company", employees: ["joe1234534", "jan223434", ...] }

For performance considerations, I'm loading to neo4j the company nodes and employee nodes separately in batch. And then I want to iterate through the 'employees' array in each company node and make the relationships: company-[:HIRES]->employee.

But although there is a FOREACH operator, I don't see how it can be used for this purpose. Or do I need to use something else?

Thanks.


Wes Freeman

unread,
Sep 11, 2013, 12:42:35 AM9/11/13
to ne...@googlegroups.com
The trouble is doing an index lookup with the strings. Which version of Neo are you using? What kind of index are the employee ids in?

If you keep this model, I think you're probably going to need to break it up into multiple cypher statements. Or use the lower-level API.

Also, keeping a list of employee ids inside of the company node is probably not giving you much performance benefit on the import. :( You might consider merging the import so you can more effectively build the relationships in, solving this problem as well.

Wes


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yin Wang

unread,
Sep 11, 2013, 12:53:17 AM9/11/13
to ne...@googlegroups.com


On Tuesday, September 10, 2013 9:42:35 PM UTC-7, Wes Freeman wrote:
The trouble is doing an index lookup with the strings. Which version of Neo are you using? What kind of index are the employee ids in?

I'm using 2.0.0.M04. I just built an index using 

    CREATE INDEX ON :Employee(user_id)

 

If you keep this model, I think you're probably going to need to break it up into multiple cypher statements. Or use the lower-level API.

Aww.... headache

 

Also, keeping a list of employee ids inside of the company node is probably not giving you much performance benefit on the import. :( You might consider merging the import so you can more effectively build the relationships in, solving this problem as well.

It's not just performance benefit. It doesn't seem to be possible to merge the imports. The number of Employee records is huge (millions). It's infeasible to find the employees hired by a certain company before they are loaded into a database. So you were asking me to have them already in a database,  and we come to the chichen-and-egg problem ;-)

Wes Freeman

unread,
Sep 11, 2013, 1:18:38 AM9/11/13
to ne...@googlegroups.com
This is the only thing I could come up with:

START e=node(*) 
MATCH c:Company 
WHERE e.user_id IN (c.employees) AND NOT (c-[:HIRED]->e) 
WITH c, e
LIMIT 1000
CREATE UNIQUE c-[:HIRED]->e 

Just keep running it till nothing is changed. 

Wes

Wes Freeman

unread,
Sep 11, 2013, 1:30:03 AM9/11/13
to ne...@googlegroups.com
Btw, this won't use an index, so it's going to be slow. :(

Breaking it out into multiple statements is likely far better. Loop this until you can't find any more in the first query:

MATCH c:Company 
WHERE NOT (c-[:HIRED]->()) 
RETURN c.company_id, c.employees 
LIMIT 1

... for user id in employees

MATCH e:Employee, c:Company
USING INDEX c:Company(company_id)
USING INDEX e:Employee(user_id)
WHERE e.user_id = {user_id}
AND c.company_id = {company_id}

CREATE UNIQUE c-[:HIRED]->e

(also assumes you have an index and a unique id on Company)

Wes

Wes Freeman

unread,
Sep 11, 2013, 1:35:42 AM9/11/13
to ne...@googlegroups.com
Also, it looks like the index hints weren't necessary. I thought it would pick just one, but it does both without hints. Replied to your SO question also.

Michael Hunger

unread,
Sep 11, 2013, 1:47:22 AM9/11/13
to ne...@googlegroups.com
Use something like the csv batch-importer to import the data.


Much easier.

Michael

Yin Wang

unread,
Sep 11, 2013, 1:19:46 PM9/11/13
to ne...@googlegroups.com
Tried batch import before but it requires stopping the server. Not sure if there is an alternative that doesn't do so.
Reply all
Reply to author
Forward
0 new messages