Re: [Aurelius] Faunus: Incremental loading for nodes and edges?

405 views
Skip to first unread message
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

David

unread,
Nov 13, 2013, 9:29:15 PM11/13/13
to aureliu...@googlegroups.com
Seeing these come out during this phase:
INFO mapreduce.FaunusCompiler: Executing job 3 out of 3: MapSequence[com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Map, com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Reduce]

java.io.IOException: Blocklist for /user/graphie/output/job-1/part-r-00000 has changed!
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:2218)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:2177)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2676)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2454)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2617)
	at java.io.DataInputStream.readFully(DataInputStream.java:195)
	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1992)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2124)
	at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:68)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:530)
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)

One reference here implies that the reader/writer are bumping into teach other.
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200711.mbox/%3C473F8B4...@cs.washington.edu%3E


On Wednesday, November 13, 2013 5:57:17 PM UTC-5, David wrote:
Hi Marko,

Still not getting it....

>> If you don't have a long, and you know the vertex is completely unique (not incremental), then you can do currentTimeMillis().
I do not have a long value but it is an incremental update.

In a previous faunus run, two nodes were created with  unique properties -  "name=marko" and "name=david".

Now I get brand new edge information that says   "marko helps david".
I want to load just that edge and that's all the information I have.

In the sample ScriptInput.groovy file, in the read method, there are two longs that are needed - a long in v.reuse() and
a long in the addEdge.  I don't know what to put in either one of those calls.

If I can get into the createOrUpdate method, I can always use the unique property (name == marko) to
find the 'from' vertex (and the to vertex also if needed).  Just not sure about the longs in the
read method.


On Wednesday, November 13, 2013 11:15:01 AM UTC-5, Marko Rodriguez wrote:
Hello,

How does one incrementally add edges using Faunus ?

It simply adds more edges to the vertex found by getOrCreate().

I've got data that looks like   fromID, toID, label, [key/value edge properties] but the fromID and toID aren't Longs, they are
unique properties on vertices that were previously written…so I can't figure out how to get the Long that refers to the "to" vertex
to do an addEdge. 

getOrCreate() happens at the map()-phase and thus, for EVERY vertex in your input dataset, the corresponding vertex in Titan is either gotten or created. When you move onto the reduce()-phase you are writing edges and all ID handling is already taken care of for you.

I have looked at the following set of links multiple times and cannot find a statement that explicitly explains  the relationship between
my custom   read   method in the ScriptInput.groovy file and the    getOrCreateVertex  method in my  IncrementalLoading.groovy file.
Maybe this is just a Duh, but not one I got just yet.
 
Here are links I've reviewed several times:
    https://github.com/thinkaurelius/faunus/wiki/Script-Format
    https://github.com/thinkaurelius/faunus/wiki/Titan-Format#incremental-data-loading-using-a-titanoutputformat
    https://github.com/thinkaurelius/faunus/wiki/Command-Line-Usage
    https://github.com/thinkaurelius/faunus/wiki/Faunus-Graph-Configuration
    https://github.com/thinkaurelius/faunus/wiki/Script-Format
    https://groups.google.com/forum/#!topic/aureliusgraphs/42Fdy3tVuDk
    https://github.com/thinkaurelius/titan/wiki/Graph-Configuration
   https://groups.google.com/forum/#!searchin/aureliusgraphs/load$20an$20edgelist/aureliusgraphs/B4u0auqqVQk/lBRIPmmuvUYJ

When is the getOrCreateVertex  method in the IncrementalLoading script called by Faunus ? 
Is it called after the custom    read    method in the ScriptInput.groovy file has returned "true" back to Faunus ?

See:


Also,  I have a similar problem as Kevin in this post: https://groups.google.com/forum/#!topic/aureliusgraphs/42Fdy3tVuDk
where I do not have a    Long   to use in the v.reuse   method call.      The data that describes the nodes is coming from
multiple  m/r  output files  part-r-00000,  etc.  which only consist of  variable key/value pair sequences.

What is the best practice for setting this Long in this case ?   Use -1 like Kevin did ? Can I just stick a currentTimeMillis in there in the   read    method
and this value is somehow discarded after the Faunus job completes ?


If you don't have a long, and you know the vertex is completely unique (not incremental), then you can do currentTimeMillis().


I have the example running (loading the adjacency matrix) and also am loading my own data, but want to understand the
incremental update thing a bit better.

HTH,
Marko.

David

unread,
Nov 14, 2013, 10:12:53 AM11/14/13
to aureliu...@googlegroups.com
The latest exception looks like this:

com.google.common.util.concurrent.UncheckedExecutionException: java.lang.NullPointerException
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
    at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
    at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
    at com.thinkaurelius.titan.graphdb.transaction.vertexcache.LRUVertexCache.get(LRUVertexCache.java:51)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.getExistingVertex(StandardTitanTx.java:262)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$5$7.apply(StandardTitanTx.java:882)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$5$7.apply(StandardTitanTx.java:877)
    at com.google.common.collect.Iterators$8.next(Iterators.java:812)
    at com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.next(QueryProcessor.java:247)
    at com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.next(QueryProcessor.java:211)
    at com.google.common.collect.Iterators$7.computeNext(Iterators.java:648)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    at com.thinkaurelius.titan.graphdb.query.QueryProcessor$OuterIterator.nextInternal(QueryProcessor.java:74)
    at com.thinkaurelius.titan.graphdb.query.QueryProcessor$OuterIterator.<init>(QueryProcessor.java:64)
    at com.thinkaurelius.titan.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:45)
    at com.google.common.collect.Iterables$7.iterator(Iterables.java:609)
    at com.google.common.collect.Iterables.getOnlyElement(Iterables.java:280)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.getType(StandardTitanTx.java:583)
    at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.getType(TitanBlueprintsGraph.java:195)
    at com.thinkaurelius.faunus.formats.titan.SchemaInferencerMapReduce$Reduce.reduce(SchemaInferencerMapReduce.java:95)
    at com.thinkaurelius.faunus.formats.titan.SchemaInferencerMapReduce$Reduce.reduce(SchemaInferencerMapReduce.java:71)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.NullPointerException
    at com.thinkaurelius.titan.graphdb.types.vertices.TitanTypeVertex.getName(TitanTypeVertex.java:27)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$VertexConstructor.get(StandardTitanTx.java:299)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$VertexConstructor.get(StandardTitanTx.java:269)
    at com.thinkaurelius.titan.graphdb.transaction.vertexcache.LRUVertexCache$2.call(LRUVertexCache.java:54)
    at com.thinkaurelius.titan.graphdb.transaction.vertexcache.LRUVertexCache$2.call(LRUVertexCache.java:51)
    at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
    ... 29 more

Marko Rodriguez

unread,
Nov 14, 2013, 11:38:06 AM11/14/13
to aureliu...@googlegroups.com
David,

This is a stack trace with no information.

Instead of using the mailing list, perhaps you can write up a ticket in the Faunus issue tracker that demonstrates how to reproduce your problem. 

For instance:

1. Use the Graph of the Gods dataset.
2. Demonstrate step-by-step how to reproduce your problem.
3. Say what version of software you are using (Hadoop, Faunus, Titan).
Just pasting a stack trace to this mailing list doesn't entice me to dig deeper.

Thanks,
Marko.
--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
Message has been deleted
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages