batch import 2.0 > failed automatic indexing?

gg4u

unread,

Aug 14, 2014, 12:33:02 PM8/14/14

to ne...@googlegroups.com

Hello folks!

I've done a batch import of 4M nodes and 100M rels.

I set up autoindexes before import, but something got wrong apparently: it takes ages for querying a single node like:

match (n:Users)-[r:REL]-()

where n.id = 25

Please help me to understand if I am doing the following steps correct, in order to create proper indexing for my nodes and properties.

Goal

I wanna query my nodes by their name property, so I am gonna index :User(name)
I want to traverse the graph: does also the node-id for each node to be indexed :User(id)? Please note that node-id is a key for indexing third part sources, so I cannot change it.

**

My data are as following:

#node.csv

id:int:student MyLabel:label name:string:user

3212 USER Mark

6367 USER Paula

...

(I set the headers by looking at the tutorial, but I am confused:

is name:string:user correct syntax to apply the property name to each node with Label 'USER' ?

Why the keyword 'student' does not apply as a label in the localhost:7474/browser?

Is it a label of nodes, or the name of the index?)

#rels.csv

id:int:student id:int:student type property

3212 6367 LOVERS april

***

In batch.properties I specified:

batch_import.node_index.name=fulltext

batch_import.node_index.student=exact

batch_import.node_index.node_auto_index=exact

Once I uploaded, I cannot search neither for USER.id, neither for USER.name

***

In neo4j.properties,

I had node_auto_indexing and node_keys_indexable commented out, so i batch imported without them.

Is the indexing failed because of this?

Do I have to redo the import with this settings?

# Enable auto-indexing for nodes, default is false

node_auto_indexing=true

# The node property keys to be auto-indexed, if enabled

node_keys_indexable=name,id

Any help for shedding light?

Michael Hunger

unread,

Aug 29, 2014, 7:01:27 PM8/29/14

to ne...@googlegroups.com

Did you create a schema index?

Labels and property names are case sensitive, you use "USER" (without s) in your batch import but "Users" (with s and different caps in cypher)

create index on :Users(id);

otherwise see for the difference between legacy and schema indexes http://nigelsmall.com/neo4j/index-confusion

the batch-importer currently only supports legacy indexes

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gg4u

unread,

Oct 3, 2014, 6:57:42 AM10/3/14

to ne...@googlegroups.com

Thank you Micheal I will check typos, in the example I posted ones just for example.

However, I do have indexes in my nodes:type (topic), node.properties: name and ID (id is integer) and set up node_auto_index

But it takes AGES to locate a node, and ages to compute a simple query!

I suspect the problem is in indexes, because quering by internal id is faster.

Please, have a look here below, and could you tell me some benchmark to have an idea of how long should take a simple query

like

MATCH (n:topic)-[r:]-(m) where.name = 'TITLE' RETURN m ORDER BY r.weight DESC LIMIT 6

on a graph of 4M nodes and 100M rels, to understand if i am doing things right ?

http://localhost:7474/webadmin/#/index/

topic lucene	{"to_lower_case":"true", "type":"fulltext"}
name lucene	{"to_lower_case":"true", "type":"fulltext"}
id lucene	{"type":"exact"}
node_auto_index lucene	{"type":"exact"}

Michael Hunger

unread,

Oct 3, 2014, 7:27:09 AM10/3/14

to ne...@googlegroups.com

Run :schema in the browser or "schema" in the shell and see what schema indexes are listed.

And make sure to read this: http://nigelsmall.com/neo4j/index-confusion

for this:

MATCH (n:topic)-[r]-(m) where.name = 'TITLE' RETURN m ORDER BY r.weight DESC LIMIT 6

do

create index on :topic(name);

then it should run in a few ms

Michael

XDiscovery Team

unread,

Oct 3, 2014, 9:10:39 AM10/3/14

to ne...@googlegroups.com

Oh I see, I did

create index on:topic(name)

and read your useful post

Again questions about the confusion:

in your post you wrote i should not mix the indexes, and that if i need to use fulltext, then legacy indexes are the one to use.

1. Could you please brief me on the proper index to set (index legacy Vs schema) for a product like:

- ability to fetch a node and its relationships / paths, based on its names.

I can set names on node properties, they could be a single one (e.g. 'Italy') or multiple once (e.g. 'Italy', 'Italie', etc.)

- nodes are all of the same types (one label), at least for large sets (e.g. millions)

2. Since I have many languages to name a node, what would you suggest, keeping in mind i have to perform fulltext search:

- a list of nodes' properties by language

- an array of property names [IT; DE, EN, FR...]

- other nodes, as many as the names associted to that name are, for each node indexsing its namel (in this case I would have a graph of 4M nodes*m languages to index)

3. How coudl improve responsiveness, even when using schema ?

As example, here some metrics:

- legacy index and not schema

MATCH n-[r]-m where n.name = 'tittle' return m limit by 6

~126K ms

MATCH [..] query to identify all shortest paths between two nodes:

~600K ms

-legacy index with schema [create index on:topic(name) ] :

MATCH n-[r]-m where n.name = 'tittle' return m limit by 6

~6K ms

MATCH [..] query to identify all shortest paths between two nodes:

~18K ms

How to reduce time for fetching a node and its realtionships, paths to ms and not Kms ? (production level)

Here using a laptop 8GB RAM, 6GB dedicated to JVM

thank you very much Micheal, it was not simple to find out this information; I read fulltext performance is on roadmap, but would need to understand if my data structure is ok and i am on the right path: MATCH me-[deployment]-[MVP]-neo4J :D

On Fri, Oct 3, 2014 at 1:27 PM, Michael Hunger <michael...@neotechnology.com> wrote:

create index on :topic(name);

--

Luigi Assom

Founder & CEO @ XDiscovery - Crazy on Human Knowledge

www.xdiscovery.com | http://learn.xdiscovery.com

T +39 349 3033334

E in...@xdiscovery.com

Skype oggigigi

Luigi Assom

unread,

Oct 3, 2014, 9:30:13 AM10/3/14

to ne...@googlegroups.com

Update:

after creating an index on property of type integer, fetching a node by id seems it take same time as by name (fulltext index): ~6K ms

create index on:topic(id)

(note here id is not the interna lnode identifier of neo)

MATCH (n) WHERE n.id =9996533 Return n;

~6K ms

while with internal id

MATCH (n) WHERE id(n) =467383 Return n;

234 ms

--
Luigi Assom

Skype contact: oggigigi

Michael Hunger

unread,

Oct 3, 2014, 12:40:10 PM10/3/14

to ne...@googlegroups.com

You forgot to use the label :topic on your query

Von meinem iPhone gesendet

Am 03.10.2014 um 15:30 schrieb Luigi Assom <luigi...@gmail.com>:

Update:
after creating an index on property of type integer, fetching a node by id seems it take same time as by name (fulltext index): ~6K ms
create index on:topic(id)

(note here id is not the interna lnode identifier of neo)

Fixed query

MATCH (n:topic) WHERE n.id =9996533 Return n;

--

XDiscovery Team

unread,

Oct 3, 2014, 3:24:03 PM10/3/14

to ne...@googlegroups.com

Oh I see, I thought it didn't matter cause I only have one type of nodes,

but i guess the difference is the schema index here: with your fix is looking within the schema right?

now I get the same time as

where id(n) = ...

thank you

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/Xn3ml37dPDU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward