Re: [Aurelius] Faunus: Incremental loading for nodes and edges?

405 views

Skip to first unread message

Message has been deleted

David

unread,

Nov 13, 2013, 9:29:15 PM11/13/13

to aureliu...@googlegroups.com

Seeing these come out during this phase:
INFO mapreduce.FaunusCompiler: Executing job 3 out of 3: MapSequence[com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Map, com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Reduce]

java.io.IOException: Blocklist for /user/graphie/output/job-1/part-r-00000 has changed!
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:2218)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:2177)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2676)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2454)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2617)
	at java.io.DataInputStream.readFully(DataInputStream.java:195)
	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1992)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2124)
	at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:68)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:530)
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)

One reference here implies that the reader/writer are bumping into teach other.
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200711.mbox/%3C473F8B4...@cs.washington.edu%3E

On Wednesday, November 13, 2013 5:57:17 PM UTC-5, David wrote:

Hi Marko,

Still not getting it....

>> If you don't have a long, and you know the vertex is completely unique (not incremental), then you can do currentTimeMillis().
I do not have a long value but it is an incremental update.

In a previous faunus run, two nodes were created with unique properties - "name=marko" and "name=david".

Now I get brand new edge information that says   "marko helps david".
I want to load just that edge and that's all the information I have.

In the sample ScriptInput.groovy file, in the read method, there are two longs that are needed - a long in v.reuse() and
a long in the addEdge. I don't know what to put in either one of those calls.

If I can get into the createOrUpdate method, I can always use the unique property (name == marko) to
find the 'from' vertex (and the to vertex also if needed). Just not sure about the longs in the
read method.

On Wednesday, November 13, 2013 11:15:01 AM UTC-5, Marko Rodriguez wrote:
Hello,

How does one incrementally add edges using Faunus ?

It simply adds more edges to the vertex found by getOrCreate().

I've got data that looks like fromID, toID, label, [key/value edge properties] but the fromID and toID aren't Longs, they are
unique properties on vertices that were previously written…so I can't figure out how to get the Long that refers to the "to" vertex
to do an addEdge.

getOrCreate() happens at the map()-phase and thus, for EVERY vertex in your input dataset, the corresponding vertex in Titan is either gotten or created. When you move onto the reduce()-phase you are writing edges and all ID handling is already taken care of for you.

I have looked at the following set of links multiple times and cannot find a statement that explicitly explains the relationship between
my custom read   method in the ScriptInput.groovy file and the    getOrCreateVertex method in my IncrementalLoading.groovy file.
Maybe this is just a Duh, but not one I got just yet.

Here are links I've reviewed several times:
    https://github.com/thinkaurelius/faunus/wiki/Script-Format
    https://github.com/thinkaurelius/faunus/wiki/Titan-Format#incremental-data-loading-using-a-titanoutputformat
    https://github.com/thinkaurelius/faunus/wiki/Command-Line-Usage
    https://github.com/thinkaurelius/faunus/wiki/Faunus-Graph-Configuration
https://github.com/thinkaurelius/faunus/wiki/Script-Format
    https://groups.google.com/forum/#!topic/aureliusgraphs/42Fdy3tVuDk
    https://github.com/thinkaurelius/titan/wiki/Graph-Configuration
   https://groups.google.com/forum/#!searchin/aureliusgraphs/load$20an$20edgelist/aureliusgraphs/B4u0auqqVQk/lBRIPmmuvUYJ

When is the getOrCreateVertex method in the IncrementalLoading script called by Faunus ?
Is it called after the custom    read    method in the ScriptInput.groovy file has returned "true" back to Faunus ?

See:
https://github.com/thinkaurelius/faunus/blob/master/src/main/java/com/thinkaurelius/faunus/formats/BlueprintsGraphOutputMapReduce.java#L127

Also, I have a similar problem as Kevin in this post: https://groups.google.com/forum/#!topic/aureliusgraphs/42Fdy3tVuDk
where I do not have a    Long   to use in the v.reuse   method call.      The data that describes the nodes is coming from
multiple m/r output files part-r-00000, etc. which only consist of variable key/value pair sequences.

What is the best practice for setting this Long in this case ?   Use -1 like Kevin did ? Can I just stick a currentTimeMillis in there in the   read    method
and this value is somehow discarded after the Faunus job completes ?

If you don't have a long, and you know the vertex is completely unique (not incremental), then you can do currentTimeMillis().

I have the example running (loading the adjacency matrix) and also am loading my own data, but want to understand the
incremental update thing a bit better.

HTH,
Marko.

http://thinkaurelius.com

David

unread,

Nov 14, 2013, 10:12:53 AM11/14/13

to aureliu...@googlegroups.com

The latest exception looks like this:

com.google.common.util.concurrent.UncheckedExecutionException: java.lang.NullPointerException
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
    at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
    at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
    at com.thinkaurelius.titan.graphdb.transaction.vertexcache.LRUVertexCache.get(LRUVertexCache.java:51)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.getExistingVertex(StandardTitanTx.java:262)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$5$7.apply(StandardTitanTx.java:882)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$5$7.apply(StandardTitanTx.java:877)
    at com.google.common.collect.Iterators$8.next(Iterators.java:812)
    at com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.next(QueryProcessor.java:247)
    at com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.next(QueryProcessor.java:211)
    at com.google.common.collect.Iterators$7.computeNext(Iterators.java:648)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    at com.thinkaurelius.titan.graphdb.query.QueryProcessor$OuterIterator.nextInternal(QueryProcessor.java:74)
    at com.thinkaurelius.titan.graphdb.query.QueryProcessor$OuterIterator.<init>(QueryProcessor.java:64)
    at com.thinkaurelius.titan.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:45)
    at com.google.common.collect.Iterables$7.iterator(Iterables.java:609)
    at com.google.common.collect.Iterables.getOnlyElement(Iterables.java:280)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.getType(StandardTitanTx.java:583)
    at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.getType(TitanBlueprintsGraph.java:195)
    at com.thinkaurelius.faunus.formats.titan.SchemaInferencerMapReduce$Reduce.reduce(SchemaInferencerMapReduce.java:95)
    at com.thinkaurelius.faunus.formats.titan.SchemaInferencerMapReduce$Reduce.reduce(SchemaInferencerMapReduce.java:71)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

Caused by: java.lang.NullPointerException
    at com.thinkaurelius.titan.graphdb.types.vertices.TitanTypeVertex.getName(TitanTypeVertex.java:27)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$VertexConstructor.get(StandardTitanTx.java:299)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$VertexConstructor.get(StandardTitanTx.java:269)
    at com.thinkaurelius.titan.graphdb.transaction.vertexcache.LRUVertexCache$2.call(LRUVertexCache.java:54)
    at com.thinkaurelius.titan.graphdb.transaction.vertexcache.LRUVertexCache$2.call(LRUVertexCache.java:51)
    at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
    ... 29 more

http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200711.mbox/%3C473F8B44.601...@cs.washington.edu%3E

Marko Rodriguez

unread,

Nov 14, 2013, 11:38:06 AM11/14/13

to aureliu...@googlegroups.com

David,

This is a stack trace with no information.

Instead of using the mailing list, perhaps you can write up a ticket in the Faunus issue tracker that demonstrates how to reproduce your problem.

https://github.com/thinkaurelius/faunus/issues?milestone=3&state=open

For instance:

1. Use the Graph of the Gods dataset.

2. Demonstrate step-by-step how to reproduce your problem.

3. Say what version of software you are using (Hadoop, Faunus, Titan).

Just pasting a stack trace to this mailing list doesn't entice me to dig deeper.

Thanks,

Marko.

http://markorodriguez.com

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.