How to increase Performance on large graphs

62 views
Skip to first unread message

George L

unread,
Mar 3, 2011, 11:33:07 AM3/3/11
to OrientDB
Hello,

I am trying to create a graph of 7000 vertices and 80000 edges with
massive insert intend set.
I also tried changing various performance tuning things as mentioned
in the wiki page.

The performance is good in secs for upto 500 - 700 vertices and their
edges, but the performance starts going bad in logarithmic scale after
that.

I am exactly facing similar problems as mentioned in these 2 threads
http://groups.google.com/group/orient-database/browse_thread/thread/6b67d44dc7da1df8/2a4b0019e71b8ba3?lnk=gst&q=load+time#2a4b0019e71b8ba3
http://groups.google.com/group/orient-database/browse_thread/thread/cf7788acd7128398/acdba1f07d79c8cc?lnk=gst&q=load+time#acdba1f07d79c8cc

I am not worried too much about the bulk insert performance, its the
read that will matter most.

Its very difficult to wait for hours to get the graph atleast during
development stages.

If there are any alternatives for creating bulk vertices and bulk
edges as mentioned in the 2nd threads last post, i would be glad to
see them.

Other graph databases use some kind of memory store to create or check
for edge presence and do a bulk insert, so it seems natural for graph
databases i guess.

I would like to see similar alternatives that are being used along
with orientdb.

Thanks
George

Luca Garulli

unread,
Mar 3, 2011, 11:58:34 AM3/3/11
to orient-database, George L
On 3 March 2011 17:33, George L <georg...@gmail.com> wrote:
Hello,

I am trying to create a graph of 7000 vertices and 80000 edges with
massive insert intend set.
I also tried changing various performance tuning things as mentioned
in the wiki page.

The performance is good in secs for upto 500 - 700 vertices and their
edges, but the performance starts going bad in logarithmic scale after
that.

I am exactly facing similar problems as mentioned in these 2 threads
http://groups.google.com/group/orient-database/browse_thread/thread/6b67d44dc7da1df8/2a4b0019e71b8ba3?lnk=gst&q=load+time#2a4b0019e71b8ba3
http://groups.google.com/group/orient-database/browse_thread/thread/cf7788acd7128398/acdba1f07d79c8cc?lnk=gst&q=load+time#acdba1f07d79c8cc

I am not worried too much about the bulk insert performance, its the
read that will matter most.

Its very difficult to wait for hours to get the graph atleast during
development stages.

If there are any alternatives for creating bulk vertices and bulk
edges as mentioned in the 2nd threads last post, i would be glad to
see them.
 
If you're interested to read performance, you need to turn on the cache:
  • if you've setted a massive intent (maybe at the beginning with a massive insertion), remove it: db.declareIntent(null);
  • leave the cache size to -1 and OrientDB will use all the resource of the system. If the free heap size is lower than 15% (it's configurable) then it optimize internal structure to free memory.
To see better what happen you need to activate the profiler:

OProfiler.getInstance().startRecording();

...

System.out.println(OProfiler.getInstance().dump());

 
Other graph databases use some kind of memory store to create or check
for edge presence and do a bulk insert, so it seems natural for graph
databases i guess.

if you've the cache enabled OrientDB does it.
 
I would like to see similar alternatives that are being used along
with orientdb.

Thanks
George

Lvc@

George L

unread,
Mar 3, 2011, 12:29:49 PM3/3/11
to OrientDB
I am working on bulk inserts now, i need to improve performance on
bulk inserts.

How good or difficult will a scenario like this one be

1. create all vertex document in a bulk operation, keep the record ids
in memory.
2. create all edge documents as they just have one in and one out. use
the record ids from memory and do a bulk creation of edge documents.
3. keep all the ins and outs for a vertex record id in memory, after
regular intervals, get vertices one by one and insert all the ins and
outs record ids for that vertex as bulk.

I don't really know how the storage mechanism works.

Does this make sense? Is it possible to achieve this?
Right now i don't see any apis to support this as well, that might be
for some good reason i guess.

George

On Mar 3, 10:58 am, Luca Garulli <l.garu...@gmail.com> wrote:
> On 3 March 2011 17:33, George L <georgel2...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > Hello,
>
> > I am trying to create a graph of 7000 vertices and 80000 edges with
> > massive insert intend set.
> > I also tried changing various performance tuning things as mentioned
> > in the wiki page.
>
> > The performance is good in secs for upto 500 - 700 vertices and their
> > edges, but the performance starts going bad in logarithmic scale after
> > that.
>
> > I am exactly facing similar problems as mentioned in these 2 threads
>
> >http://groups.google.com/group/orient-database/browse_thread/thread/6...
>
> >http://groups.google.com/group/orient-database/browse_thread/thread/c...
>
> > I am not worried too much about the bulk insert performance, its the
> > read that will matter most.
>
> > Its very difficult to wait for hours to get the graph atleast during
> > development stages.
>
> > If there are any alternatives for creating bulk vertices and bulk
> > edges as mentioned in the 2nd threads last post, i would be glad to
> > see them.
>
> If you're interested to read performance, you need to turn on the cache:
>
>    - if you've setted a massive intent (maybe at the beginning with a
>    massive insertion), remove it: db.declareIntent(null);
>    - leave the cache size to -1 and OrientDB will use all the resource of

Luca Garulli

unread,
Mar 3, 2011, 1:05:14 PM3/3/11
to orient-database, George L
Hi George,
are you the same George that wrote:

"Thanks Luca, I set the Massive insert intend and the performance is very good now. The operation that took 3+ mins is taking only 24 secs" ?

So you have problem on insertion yet?

This operation is quite simple, but I can't help too much if you don't send me the profiler values.

Lvc@

George L

unread,
Mar 3, 2011, 2:43:28 PM3/3/11
to OrientDB
Yes, that was me.

That was the result of a short test not a big one.

The profiler dump i attached is also not for the full graph, just a
part of the full graph i am trying to build

OrientDB profiler dump of Thu Mar 03 13:21:34 CST 2011
Free memory: 144.88Mb (4.51%) - Total memory: 1827.15Mb - Max memory:
3212.51Mb - CPUs: 4
HOOK VALUES:

+-------------------------------------------------------------------+
Name |
Value |

+-------------------------------------------------------------------+
db.tt.cache.current |
44221 |
db.tt.cache.max |
-1 |
index.Brands.pk.entryPointSize |
16 |
index.Brands.pk.items |
134 |
index.Brands.pk.maxUpdateBeforeSave |
5000 |
index.Brands.pk.optimizationThreshold |
100000 |
index.Categories.pk.entryPointSize |
16 |
index.Categories.pk.items |
17 |
index.Categories.pk.maxUpdateBeforeSave |
5000 |
index.Categories.pk.optimizationThreshold |
100000 |
index.Products.pk.entryPointSize |
16 |
index.Products.pk.items |
4029 |
index.Products.pk.maxUpdateBeforeSave |
5000 |
index.Products.pk.optimizationThreshold |
100000 |
index.Sites.pk.entryPointSize |
16 |
index.Sites.pk.items |
1 |
index.Sites.pk.maxUpdateBeforeSave |
5000 |
index.Sites.pk.optimizationThreshold |
100000 |
index.SpecValues.pk.entryPointSize |
16 |
index.SpecValues.pk.items |
7833 |
index.SpecValues.pk.maxUpdateBeforeSave |
5000 |
index.SpecValues.pk.optimizationThreshold |
100000 |
index.Specs.pk.entryPointSize |
16 |
index.Specs.pk.items |
130 |
index.Specs.pk.maxUpdateBeforeSave |
5000 |
index.Specs.pk.optimizationThreshold |
100000 |
memory.alerts |
0 |
mmap.blockSize |
327680 |
mmap.blocks |
520 |
mmap.maxMemory |
2612889600 |
mmap.strategy |
MMAP_ONLY_AVAIL_POOL |
mmap.totalMemory |
168667714 |
storage.tt.cache.current |
1 |
storage.tt.cache.max |
-1 |

+-------------------------------------------------------------------+
DUMPING COUNTERS (last reset on: Thu Mar 03 12:44:26 CST 2011)...

+-------------------------------------------------------------------+
Name |
Value |

+-------------------------------------------------------------------+
ODataLocal.setRecord:new.space |
156169 |
ODataLocal.setRecord:tot.reused.space |
52082 |
OMMapManager.reusedPage |
933143379 |
OMMapManager.reusedPageBetweenLast |
8283621 |
OMVRBTreeEntryP.serializeKey |
5000 |
OMVRBTreeEntryP.serializeValue |
5000 |
OMemOutStream.resize |
183 |
db.tt.cache.found |
19013631 |
db.tt.cache.notFound |
156353006 |

+-------------------------------------------------------------------+

DUMPING STATISTICS (last reset on: Thu Mar 03 12:44:26 CST 2011).
Times in ms...

+-------------------------------------------------------------------+
Name | last
total min max average items |

+-------------------------------------------------------------------+
[OMVRBTree.getEntry] Steps of search | 24
1034286 0 26 7 140512 |

+-------------------------------------------------------------------+

DUMPING CHRONOS (last reset on: Thu Mar 03 12:44:26 CST 2011). Times
in ms...

+-------------------------------------------------------------------+
Name | last
total min max average items |

+-------------------------------------------------------------------+
OMMapManager.loadPage | 0
484 0 341 0 511 |
OMVRBTreeEntryP.toStream | 0
76 0 13 1 65 |
OMVRBTreePersistent.commitChanges | 79
79 79 79 79 1 |
OMVRBTreePersistent.put | 0
146 0 79 0 12144 |
OMVRBTreePersistent.toStream | 0
0 0 0 0 1 |
ORecordSerializerStringAbstract.fromStream | 0
1763619 0 1371 0 156322048 |
ORecordSerializerStringAbstract.toStream | 0
43443 0 341 0 272389 |
OStorageLocal.createRecord | 0
337 0 16 0 64226 |
OStorageLocal.foreach | 65
2184499 0 1433 20 108300 |
OStorageLocal.readRecord | 0
203147 0 258 0 156353006 |
OStorageLocal.updateRecord | 0
1743 0 341 0 208251 |

+-------------------------------------------------------------------+
java.lang.Error: Cleaner terminated abnormally
at sun.misc.Cleaner$1.run(Cleaner.java:130)
at java.security.AccessController.doPrivileged(Native Method)
at sun.misc.Cleaner.clean(Cleaner.java:127)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:
124)
Caused by: java.io.IOException: The process cannot access the file
because another process has locked a portion of the file
at sun.nio.ch.FileChannelImpl.unmap0(Native Method)
at sun.nio.ch.FileChannelImpl.access$100(FileChannelImpl.java:
32)
at sun.nio.ch.FileChannelImpl
$Unmapper.run(FileChannelImpl.java:667)
at sun.misc.Cleaner.clean(Cleaner.java:125)
... 1 more


On Mar 3, 12:05 pm, Luca Garulli <l.garu...@gmail.com> wrote:
> Hi George,
> are you the same George that wrote:
>
> "Thanks Luca, I set the Massive insert intend and the performance is very
> good now. The operation that took 3+ mins is taking only 24 secs" ?
>
> So you have problem on insertion yet?
>
> This operation is quite simple, but I can't help too much if you don't send
> me the profiler values.
>
> Lvc@
>
Reply all
Reply to author
Forward
0 new messages