Scale limits when storing properties on relationships

瀏覽次數:31 次
跳到第一則未讀訊息

John Fry

未讀,
2016年8月8日 凌晨12:05:172016/8/8
收件者:Neo4j
Hi All,

In ne04j 2.3 what / where are the limits when storing properties on relationships?

I have a graph with about 200M relationships and for each relationship I want to add floating point attributes as properties. 
Here is what I am experiencing:
  • adding 2 properties per rel - all works fine; very good performance
  • adding 5 properties per rel - start to see exceptions/crashes - can be fixed by turning off transaction logging - good performance
  • adding ~7 properties per rel -  performance dramatically fades (10x slower) - occasional exceptions/crashes
  • adding ~10 properties per real - performance stalls/stops - eventually will crash 
What is a realistic set of expectations for storing this many properties where the relationship store could easily exceed > 20GB?

Regards and thanks for any advice, John.

Michael Hunger

未讀,
2016年8月8日 凌晨3:12:132016/8/8
收件者:ne...@googlegroups.com、ra...@neo4j.com
Hi John,

Do you have more details on the properties that you add as well as your graph model and queries? Without these details it will be hard to help. 

It sounds a bit as if your property heavy relationships might be nodes in hiding.

Cheers Michael


Von meinem iPhone gesendet
--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Fry

未讀,
2016年8月8日 中午12:56:072016/8/8
收件者:Neo4j、ra...@neo4j.com
Hello Michael,

the graph is used as follows:
  • ~10M nodes; ~200M relationships 
  • Each relationship requires multiple floating properties that can be considered connecting strength weights. These multiple weights make up a weight vector - upto ~20 weights per vector 
  • The weights on the relationship are static (or at least they rarely change)
  • The weight vector is used to compute custom (very algorithmic in nature) costs per link to drive node-to-node traversals, expansions and to find cost based n-shortest paths
  • The costs per link are calculated in as close to real time as possible and are always different and are never stored or written back to the relationships in the graph
Regards, John.

Michael Hunger

未讀,
2016年8月9日 清晨6:35:502016/8/9
收件者:ne...@googlegroups.com
Hi John,

which kind of "transaction logging did you turn off" ?

Would you be able to share the queries you are using?

each double property takes 8 bytes of storage in the property-record (which are linked in a chain, each property-record can hold up to 4 4-byte-storage properties).

But arrays are optimized, esp. if you have small values in your weights it tries to use only the significant bits to encode values in an array (but I think it might only do that for integer values).

Would you be able to run a test where instead of having 5-10 individual properties you just use an array with that many entries?

And perhaps even better project the the floating point values to integer values in that array.

I also ask our kernel engineers for other tips in this regard.

HTH,

Michael

To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

John Fry

未讀,
2016年8月9日 上午9:51:372016/8/9
收件者:Neo4j
Hi Michael, thanks...

some more background info on the queries:
* note I am using neo ver 2.2 (I guess I should finally upgrade to 3+)
* everything I do is via the java api
* the queries are traversals and expansions: 
--- I walk the graph node to node selecting each node by a function of the weight vectors
--- I expand around a node to a depth on n for both incoming and outgoing directions
--- I commonly use shortest path using dijkstra with my own cost evaluators that use the weight vectors
--- once I have a reliable way to write all the properties I will use the graph exclusively in 'read-only' mode. I only write the properties as part of a graph creation process which is a single event usage - fast and predictable creation of course is nice to achieve.

I turn of transaction logging with: keep_logical_logs=false.

Let me try using an integer array as a single property and see how that performs.

Thanks, John.

Michael Hunger

未讀,
2016年8月9日 下午4:32:212016/8/9
收件者:ne...@googlegroups.com
Oh sorry, I might have misunderstood you.

Do you see the performance issue when creating the data or when accessing it?

Could you share your graph-creation code?

M

To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
訊息已遭刪除

John Fry

未讀,
2016年8月13日 中午12:25:302016/8/13
收件者:Neo4j

Hi Michael, using arrays to store the properties solved the performance issues as you suggested. The application is completing 10x faster easily


BUT it creates another problem. From the pseudo-code below I see the following behaviour:

  • when the array lcm contains 16 test values (all -1.0f) the application runs at performance and I can open the db (via neo4j-shell) and see the relationship have 16 x -1.0fs stored in a property array
  • when lcm contains real and different values (e.g. 16 random floats) the application runs at performance BUT the .db won't open in neo4j-shell - it fails with the exception show below
  • if i limit the size of the lcm array to 2 or 4 real/random floats then it works

I am guessing the property stores are compressed or something?


Regards, John.




        public class ScoredLink {

    long id;

      float[] lcm = new float[16]; 

        ..........etc


        public static void main(String[] args)

        // ...do the math and score the 200M links local, in-memory

        // open the neo4j db

        // create batches of 500 relationships/links to write back 

        // push the batches into a thread pool

        // for each thread....

        try ( Transaction tx = db.beginTx() ) {

            for (int i=start; i<=end; i++) { // i.e. start-end=500

                ScoredLink sl = scoredLinks.get(i);

            Relationship l = db.getRelationshipById(sl.id);

             l.setProperty("lwa_lcm", sl.lcm); //all 16 lcm vals

         }

     }

     tx.success();

        tx.close();

               ..........etc



ubuntu@ip-172-31-3-11:/opt/RAI/bin$ sudo neo4j-shell -v -path /opt/neo4j/data/graph.db/
ERROR (-v for expanded information):
Error starting org.neo4j.kernel.impl.factory.CommunityFacadeFactory, /opt/neo4j/data/graph.db
java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.CommunityFacadeFactory, /opt/neo4j/data/graph.db
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:143)
at org.neo4j.kernel.impl.factory.CommunityFacadeFactory.newFacade(CommunityFacadeFactory.java:43)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:108)
at org.neo4j.graphdb.factory.GraphDatabaseFactory.newDatabase(GraphDatabaseFactory.java:129)
at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:117)
at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:185)
at org.neo4j.shell.kernel.GraphDatabaseShellServer.instantiateGraphDb(GraphDatabaseShellServer.java:203)
at org.neo4j.shell.kernel.GraphDatabaseShellServer.<init>(GraphDatabaseShellServer.java:66)
at org.neo4j.shell.StartClient.getGraphDatabaseShellServer(StartClient.java:282)
at org.neo4j.shell.StartClient.tryStartLocalServerAndClient(StartClient.java:259)
at org.neo4j.shell.StartClient.startLocal(StartClient.java:247)
at org.neo4j.shell.StartClient.start(StartClient.java:180)
at org.neo4j.shell.StartClient.main(StartClient.java:135)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.recovery.Recovery@10c38489' failed to initialize. Please see attached cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:434)
at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:66)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:102)
at org.neo4j.kernel.NeoStoreDataSource.start(NeoStoreDataSource.java:600)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.kernel.impl.transaction.state.DataSourceManager.start(DataSourceManager.java:112)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:139)
... 12 more
Caused by: java.lang.IllegalArgumentException: Unknown entry type 7 for version 0. At position LogPosition{logVersion=170, byteOffset=16} and entry version V1_9
at org.neo4j.kernel.impl.transaction.log.entry.LogEntryVersion.entryParser(LogEntryVersion.java:207)
at org.neo4j.kernel.impl.transaction.log.entry.VersionAwareLogEntryReader.readLogEntry(VersionAwareLogEntryReader.java:92)
at org.neo4j.kernel.impl.transaction.log.LogEntryCursor.next(LogEntryCursor.java:54)
at org.neo4j.kernel.recovery.LatestCheckPointFinder.find(LatestCheckPointFinder.java:77)
at org.neo4j.kernel.recovery.PositionToRecoverFrom.apply(PositionToRecoverFrom.java:53)
at org.neo4j.kernel.recovery.DefaultRecoverySPI.getPositionToRecoverFrom(DefaultRecoverySPI.java:135)
at org.neo4j.kernel.recovery.Recovery.init(Recovery.java:72)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:424)
... 21 more

 -host      Domain name or IP of host to connect to (default: localhost)
 -port      Port of host to connect to (default: 1337)
 -name      RMI name, i.e. rmi://<host>:<port>/<name> (default: shell)
 -pid       Process ID to connect to
 -c         Command line to execute. After executing it the shell exits
 -file      File containing commands to execute, or '-' to read from stdin. After executing it the shell exits
 -readonly  Connect in readonly mode (only for connecting with -path)
 -path      Points to a neo4j db path so that a local server can be started there
 -config    Points to a config file when starting a local server

Example arguments for remote:
-port 1337
-host 192.168.1.234 -port 1337 -name shell
-host localhost -readonly
...or no arguments for default values
Example arguments for local:
-path /path/to/db
-path /path/to/db -config /path/to/neo4j.config
-path /path/to/db -readonly

Michael Hunger

未讀,
2016年8月13日 下午2:32:442016/8/13
收件者:ne...@googlegroups.com
Hi John,

thanks a lot for reporting back.

Would you mind creating a GH issue (if possible reproducible with a minimal test)?

Do you get a clean shutdown (db.shutdown() for your program when creating the data? 

I haven't seen an error with that kind of property on recovery.
Does the recovery error also happen when you open the db again from your java program?

Thanks so much,

Michael



To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

John Fry

未讀,
2016年8月13日 晚上11:07:212016/8/13
收件者:Neo4j
Hi Michael, here is an update
  • I found that I was writing NaNs into the array which caused the exception in yellow. I fixed this with a simple trap:
if (Float.isFinite(lcm))

    sl.lcm[idx] = lcm;

else

    sl.lcm[idx] = -1.0f;

  • When this was fixed I could write to all 200M relationships and the db would open in the no4j-shell - BUT it would flag and exception when exiting & closing. i.e. it wouldn't do a clean shut down.
    • By turning off all transaction logging the db now opens and closes without issues and all 200M 16 element float arrays are successfully written.
So, to get this working I have to trap for NaNs and turn off transaction logging then my app will write all 200M property arrays and open and close cleanly in neo4j-shell.
I haven't yet tried opening it from a java app - I will let you know if I have issues.

What is a GH issues and how do I file one?

Rgds, John

Michael Hunger

未讀,
2016年8月14日 清晨7:00:042016/8/14
收件者:ne...@googlegroups.com
Hi John,

Thanks a lot for the detailed feedback.

GH issues is bugtracking on GitHub: https://github.com/neo4j/neo4j/issues

Please report the NaN issues (which might be not so easy to resolve).

And more importantly the need for disabling the transaction-log (which is actually the write-ahead-log which takes care of recovery of already committed transactions after machine / program crashes)

There was no out of disk issue I presume? You can configure the transaction log to be either disabled or to only hold a certain amount or for a certain number of days.

Michael


To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
回覆所有人
回覆作者
轉寄
0 則新訊息