batch import 2.0 > failed automatic indexing?

28 views
Skip to first unread message

gg4u

unread,
Aug 14, 2014, 12:33:02 PM8/14/14
to ne...@googlegroups.com
Hello folks!

I've done a batch import of 4M nodes and 100M rels.

I set up autoindexes before import, but something got wrong apparently: it takes ages for querying a single node like:
match (n:Users)-[r:REL]-()
where n.id = 25


Please help me to understand if I am doing the following steps correct, in order to create proper indexing for my nodes and properties.

Goal
  •  I wanna query my nodes by their name property, so I am gonna index :User(name)
  • I want to traverse the graph: does also the node-id for each node to be indexed :User(id)? Please note that node-id is a key for indexing third part sources, so I cannot change it.
**
My data are as following:
#node.csv
id:int:student  MyLabel:label name:string:user
3212 USER Mark
6367 USER Paula
...

(I set the headers by looking at the tutorial, but I am confused:
is   name:string:user   correct syntax to apply the property name to each node with Label 'USER' ?
Why the keyword 'student' does not apply as a label in the localhost:7474/browser?
Is it a label of nodes, or the name of the index?)



#rels.csv
id:int:student id:int:student type property
3212 6367 LOVERS april

***
In batch.properties I specified:

batch_import.node_index.student=exact
batch_import.node_index.node_auto_index=exact


Once I uploaded, I cannot search neither for USER.id, neither for USER.name

***
In neo4j.properties,
I had node_auto_indexing and node_keys_indexable commented out, so i batch imported without them.

Is the indexing failed because of this?

Do I have to redo the import with this settings?

# Enable auto-indexing for nodes, default is false
node_auto_indexing=true

# The node property keys to be auto-indexed, if enabled
node_keys_indexable=name,id


Any help for shedding light?

Michael Hunger

unread,
Aug 29, 2014, 7:01:27 PM8/29/14
to ne...@googlegroups.com
Did you create a schema index?

Labels and property names are case sensitive, you use "USER" (without s) in your batch import but "Users" (with s and different caps in cypher)

create index on :Users(id);

otherwise see for the difference between legacy and schema indexes http://nigelsmall.com/neo4j/index-confusion
the batch-importer currently only supports legacy indexes

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gg4u

unread,
Oct 3, 2014, 6:57:42 AM10/3/14
to ne...@googlegroups.com
Thank you Micheal I will check typos, in the example I posted ones just for example.

However, I do have indexes in my nodes:type (topic), node.properties: name and ID (id is integer) and set  up node_auto_index

But it takes AGES to locate a node, and ages to compute a simple query!
I suspect the problem is in indexes, because quering by internal id is faster.

Please, have a look here below, and could you tell me some benchmark to have an idea of how long should take a simple query 
like 
MATCH (n:topic)-[r:]-(m) where.name = 'TITLE' RETURN m ORDER BY r.weight DESC LIMIT 6

on a graph of 4M nodes and 100M rels, to understand if i am doing things right  ?


topic

lucene

{"to_lower_case":"true", "type":"fulltext"}

name

lucene

{"to_lower_case":"true", "type":"fulltext"}

id

lucene

{"type":"exact"}

node_auto_index

lucene

{"type":"exact"}



Michael Hunger

unread,
Oct 3, 2014, 7:27:09 AM10/3/14
to ne...@googlegroups.com
Run :schema in the browser or "schema" in the shell and see what schema indexes are listed.

And make sure to read this:  http://nigelsmall.com/neo4j/index-confusion

for this:
MATCH (n:topic)-[r]-(m) where.name = 'TITLE' RETURN m ORDER BY r.weight DESC LIMIT 6

do

create index on :topic(name);

then it should run in a few ms

Michael

XDiscovery Team

unread,
Oct 3, 2014, 9:10:39 AM10/3/14
to ne...@googlegroups.com
Oh I see, I did

create index on:topic(name) 

and read your useful post


Again questions about the confusion:

in your post you wrote i should not mix the indexes, and that if i need to use fulltext, then legacy indexes are the one to use.

1. Could you please brief me on the proper index to set (index legacy Vs schema) for a product like:

- ability to fetch a node and its relationships / paths, based on its names.
I can set names on node properties, they could be a single one (e.g. 'Italy') or multiple once (e.g. 'Italy', 'Italie', etc.)

- nodes are all of the same types (one label), at least for large sets (e.g. millions)

2. Since I have many languages to name a node, what would you suggest, keeping in mind i have to perform fulltext search:
- a list of nodes' properties by language
- an array of property names [IT; DE, EN, FR...]
- other nodes, as many as the names associted to that name are, for each node indexsing its namel (in this case I would have a graph of 4M nodes*m languages to index)

3. How coudl improve responsiveness, even when using schema ?

As example, here some metrics:

- legacy index and not schema
MATCH n-[r]-m where n.name = 'tittle' return m limit by 6
~126K ms

MATCH [..] query to identify all shortest paths between two nodes:
~600K ms


-legacy index with schema [create index on:topic(name) ] :
MATCH n-[r]-m where n.name = 'tittle' return m limit by 6
~6K ms

MATCH [..] query to identify all shortest paths between two nodes:
~18K ms

How to reduce time for fetching a node and its realtionships, paths to ms and not Kms ? (production level)

Here using a laptop 8GB RAM, 6GB dedicated to JVM

thank you very much Micheal, it was not simple to find out this information; I read fulltext performance is on roadmap, but would need to understand if my data structure is ok and i am on the right path: MATCH me-[deployment]-[MVP]-neo4J  :D
 







On Fri, Oct 3, 2014 at 1:27 PM, Michael Hunger <michael...@neotechnology.com> wrote:
create index on :topic(name);




--
Luigi Assom
Founder & CEO @ XDiscovery - Crazy on Human Knowledge
Skype oggigigi

Luigi Assom

unread,
Oct 3, 2014, 9:30:13 AM10/3/14
to ne...@googlegroups.com
Update:
after creating an index on property of type integer, fetching a node by id seems it take same time as by name (fulltext index): ~6K ms
create index on:topic(id)

(note here id is not the interna lnode identifier of neo)

MATCH (n) WHERE n.id =9996533 Return n;
~6K ms

while with internal id

MATCH (n) WHERE id(n) =467383 Return n;
234 ms



--
Luigi Assom

Skype contact: oggigigi

Michael Hunger

unread,
Oct 3, 2014, 12:40:10 PM10/3/14
to ne...@googlegroups.com
You forgot to use the label :topic on your query

Von meinem iPhone gesendet

Am 03.10.2014 um 15:30 schrieb Luigi Assom <luigi...@gmail.com>:

Update:
after creating an index on property of type integer, fetching a node by id seems it take same time as by name (fulltext index): ~6K ms
create index on:topic(id)

(note here id is not the interna lnode identifier of neo)

Fixed query

MATCH (n:topic) WHERE n.id =9996533 Return n;
--

XDiscovery Team

unread,
Oct 3, 2014, 3:24:03 PM10/3/14
to ne...@googlegroups.com
Oh I see, I thought it didn't matter cause I only have one type of nodes, 
but i guess the difference is the schema index here: with your fix is looking within the schema right?

now I get the same time as 
where id(n) = ...

thank you




--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/Xn3ml37dPDU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages