Creating a database in batch mode

32 views
Skip to first unread message

Alexandre Vallette

unread,
May 15, 2013, 3:40:55 AM5/15/13
to thunderd...@googlegroups.com
Hello,

I've forked thunderdome to implement other properties that where needed in my model (time, date, timedelta), so I'm ready to contribute to this nice project.

My problem currently with thunderdome is the time it takes to load a db: here I have 500k edges graph and it took more than the night (check log file attached).
So first of all, is it I who is doing something wrong? Are their options that should be tuned?

If not I heard that there is a way to load data in batch mode with titan. Where to start to implement this into thunderdome?

By the way, I use Titan 0.3 and I get an error when connecting to the database but if i embedde this in try/except it works fine, though there is an error message in the log.
log.txt

Alexandre Vallette

unread,
May 15, 2013, 4:22:43 AM5/15/13
to thunderd...@googlegroups.com
setting 
storage.batch-loading=true

in the configuration already makes a difference, but i'm looking carefully at https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation
and thinking how it could be plugged to thunderdome

Alexandre Vallette

unread,
May 15, 2013, 10:53:20 AM5/15/13
to thunderd...@googlegroups.com
I did some work on this but my only problem is where to define the batchgraph so it is accessible by gremlin methods...

Jonathan Haddad

unread,
May 15, 2013, 10:56:03 AM5/15/13
to Alexandre Vallette, thunderd...@googlegroups.com
Hi Alexandre,

Loading vertices & edges one by one over REST will definitely be slow, for a few reasons.

#1. the overhead of REST is a significant percentage of the operation.  we have an open issue to switch to rexpro since it's recently gotten a major upgrade
#2. each transaction has significant overhead.  once we have rexpro implemented, we want to include something like this:

with thunderdome.transaction() as t:
   # lots of stuff

where at the end of the transaction, all the mutations are committed.  This will give a huge performance boost as well.  

As far as the batch graph goes - that's a little more complicated.  I don't know the current status on rexster when it comes to wrapping graphs.  Blake might be able to better answer this one.


--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Blake Eggleston

unread,
May 15, 2013, 12:56:48 PM5/15/13
to thunderd...@googlegroups.com
Hi Alexandre,

I took a look at your log file. On connection, thunderdome tries to create an index for the vid field. However, since Titan's syntax for defining types and indices changed from Titan 0.2.1 to 0.3.0, that will fail, and appears to be what's causing the exception your seeing.

Regarding the batch graph stuff, rexster does not really support graph wrappers at the moment, especially over rest. It is possible to wrap a graph in a rexpro session, but a) thunderdome doesn't support rexpro at the moment, and b) that will be strange since the graph will only be wrapped in that session and not rexster-wide

Alexandre Vallette

unread,
May 15, 2013, 2:47:59 PM5/15/13
to thunderd...@googlegroups.com
thanks for your advices.

I figured out that for the connection it was titan 0.3 that was causing troubles so i changed 
   create_unique_index('vid', 'String')
to
   execute_query("g.makeType().name('vid').dataType(String.class).indexed(Vertex.class).unique(Direction.BOTH).makePropertyKey()")

Concerning rexpro, i gave it a try but the problem is about session:
I can't keep any handle on the batchgraph wrapper:

In [19]: conn.execute("bg = new BatchGraph(g,VertexIDType.STRING,1000) ; bg.setVertexIdKey('vid'); bg.addVertex(1)")
Out[19]: {'_id': '1', '_properties': {'vid': 1}, '_type': 'vertex'}

works, but:

In [20]: conn.execute("bg = new BatchGraph(g,VertexIDType.STRING,1000) ; bg.setVertexIdKey('vid');")

In [21]: conn.execute("bg.addVertex()")
---------------------------------------------------------------------------
RexProScriptException                     Traceback (most recent call last)
<ipython-input-21-b98d150752b7> in <module>()
----> 1 conn.execute("bg.addVertex()")

/usr/local/lib/python2.7/site-packages/rexpro/connection.pyc in execute(self, script, params, isolate, transaction, pretty)
    246 
    247         if isinstance(response, messages.ErrorResponse):
--> 248             raise exceptions.RexProScriptException(response.message)
    249 
    250         return response.results

RexProScriptException: An error occurred while processing the script for language [groovy]. All transactions across all graphs in the session have been concluded with failure: java.util.concurrent.ExecutionException: javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: bg for class: Script3

Jon Haddad

unread,
May 15, 2013, 11:20:04 PM5/15/13
to thunderd...@googlegroups.com
Unfortunately that BatchGraph object will only exist for the duration of the request, and not be available for subsequent ones.  It's a limitation of Rexster.
Reply all
Reply to author
Forward
0 new messages