Re: [TinkerPop] Faunus 0.3 and Cassandra in embedded mode

186 views
Skip to first unread message

Marko Rodriguez

unread,
Apr 27, 2013, 11:18:18 AM4/27/13
to gremli...@googlegroups.com, aureliu...@googlegroups.com
Hi,

One easy thing to do off the bat -- trying "cassandrathrift" as your storage.backend.

Next, did you make sure you created a keyspace in Cassandra? This is done easily using:

g = TitanFactory.open(…)

Finally, given that Astyanax is saying there is a NoHostException -- you might have an issue with your security permissions. Is this running on EC2? And if so, is Hadoop collocated with your Cassandra cluster? If not, then you will need to provide the Hadoop cluster access to the Cassandra cluster. If you Hadoop and Cassandra cluster are collocated, you can always to 'localhost' as the hostname as then you get co-located access to the data in cluster.

HTH,
Marko.


On Apr 27, 2013, at 8:45 AM, Daniel Kuppitz <daniel....@shoproach.com> wrote:

HI,

it's the first time that I try Faunus 0.3 and doesn't work as expected. I'm trying to import a GraphSON file into a Titan/Cassandra cluster. Everything runs on a local VM, I've double checked IP and port settings, but still get an exception.

My faunus configuration file (it's actually the sample file with a changed input file location and output hostname):

# input
faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
faunus.input.location=../adam.graphson

# output
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.cassandra.TitanCassandraOutputFormat
faunus.graph.output.titan.storage.backend=cassandra
faunus.graph.output.titan.storage.hostname=10.0.0.1
faunus.graph.output.titan.storage.port=9160
faunus.graph.output.titan.storage.keyspace=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.graph.output.titan.infer-schema=true
faunus.graph.output.blueprints.tx-commit=5000

faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=output
faunus.output.location.overwrite=true

I have 3 instances of Titan server running on different IPs (10.0.0.1, 10.0.0.2, 10.0.0.3) in embedded cassandra mode.

Ok, so far so good, Then I tried to bulk load my GraphSON file:

gremlin> g = FaunusFactory.open('faunus.properties')
==>faunusgraph[graphsoninputformat->titancassandraoutputformat]
gremlin> g._
...

The last command is followed by a lot of info messages and suddenly after ~1 minute I get an exception:

13/04/27 16:35:13 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalArgumentException: Could not instantiate implementation: com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager
        at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:268)
        at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:226)
        at com.thinkaurelius.titan.diskstorage.Backend.<init>(Backend.java:97)
        at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:406)
        at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:62)
        at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:40)
        at com.thinkaurelius.faunus.formats.titan.GraphFactory.generateGraph(GraphFactory.java:20)
        at com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.generateGraph(BlueprintsGraphOutputMapReduce.java:61)
        at com.thinkaurelius.faunus.formats.titan.SchemaInferencerMapReduce$Reduce.setup(SchemaInferencerMapReduce.java:71)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:262)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
        at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:257)
        ... 12 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryStorageException: Temporary failure in storage backend
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:394)
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.<init>(AstyanaxStoreManager.java:164)
        ... 17 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.NoAvailableHostsException: NoAvailableHostsException: [host=None(0.0.0.0):0, latency=0(0), attempts=0] No hosts to borrow from
        at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.<init>(RoundRobinExecuteWithFailover.java:31)
        at com.netflix.astyanax.connectionpool.impl.TokenAwareConnectionPoolImpl.newExecuteWithFailover(TokenAwareConnectionPoolImpl.java:74)
        at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:229)
        at com.netflix.astyanax.thrift.ThriftClusterImpl.executeSchemaChangeOperation(ThriftClusterImpl.java:131)
        at com.netflix.astyanax.thrift.ThriftClusterImpl.addKeyspace(ThriftClusterImpl.java:252)
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:389)
        ... 18 more

Looks like a connection problem, but I've double checked all my host/port settings and can't find any error.

Any ideas?

Cheers,
Daniel

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Daniel Kuppitz

unread,
Apr 27, 2013, 12:14:14 PM4/27/13
to aureliu...@googlegroups.com, gremli...@googlegroups.com
Yep, I've tried cassandrathrift, but it doens't work either. The keyspace is created (the graph is not empty). Everything (Titan Server / Cassandra, Hadoop, Faunus) is running on the same machine with virtual IP adresses.

Some system info:

Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.5.0-27-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

  System information as of Sat Apr 27 18:01:41 CEST 2013

  System load:  0.06              Users logged in:     1
  Usage of /:   83.9% of 5.67GB   IP address for lo:1: 10.0.0.1
  Memory usage: 54%               IP address for lo:2: 10.0.0.2
  Swap usage:   0%                IP address for lo:3: 10.0.0.3
  Processes:    129               IP address for eth0: 192.168.2.105

  Graph this data and manage this system at https://landscape.canonical.com/

Last login: Sat Apr 27 15:19:18 2013 from localhost
daniel@titan:~$ ./cassandra/apache-cassandra-1.2.3/bin/nodetool -h 10.0.0.1 -p 8001 ring

Datacenter: datacenter1
==========
Replicas: 1

Address         Rack        Status State   Load            Owns                Token
                                                                               0
10.0.0.3        rack1       Up     Normal  101,09 KB       33,33%              113427455640312814857969558651062452224
10.0.0.2        rack1       Up     Normal  104,56 KB       33,33%              56713727820156407428984779325531226112
10.0.0.1        rack1       Up     Normal  152,47 KB       33,33%              0

daniel@titan:~$ ./cassandra/apache-cassandra-1.2.3/bin/nodetool -h 10.0.0.1 -p 8001 info
Token            : 0
ID               : 6ae6824d-438f-43af-87a1-e4a4df17e875
Gossip active    : false
Thrift active    : false
Load             : 152,47 KB
Generation No    : 0
Uptime (seconds) : 94682
Heap Memory (MB) : 30,52 / 455,13
Data Center      : datacenter1
Rack             : rack1
Exceptions       : 1
Key Cache        : size 313 (bytes), capacity 1048576 (bytes), 170 hits, 175 requests, NaN recent hit rate, 14400 save period in seconds
Row Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

Cheers,
Daniel

Marko Rodriguez

unread,
Apr 27, 2013, 12:21:51 PM4/27/13
to aureliu...@googlegroups.com, gremli...@googlegroups.com
Hi,

I would recommend playing with your security groups (perhaps -- though you are co-located). Also, try using "localhost" as your storage.hostname.

Marko.
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

Marko Rodriguez

unread,
Apr 27, 2013, 1:02:17 PM4/27/13
to aureliu...@googlegroups.com, gremli...@googlegroups.com
Hi Daniel,

Also another thing to consider is how many mappers/reducers you are using. Amazon EC2 is a very "hiccuppy" system. Make sure you are not stressing the system as you will get bottle neck issues. Try few mappers/reducers and work your way up.

I just had a reducer fail on a job I'm doing right now with the exception you had (see appendix). For some reason, one machine was not even SSH available for a good minute ?! … then it kicked back on and all is back to working :| .

Here are my properties if you care:

# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=friendster/

# output data (graph or statistic) parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.cassandra.TitanCassandraOutputFormat
faunus.graph.output.titan.storage.backend=cassandra
faunus.graph.output.titan.storage.hostname=localhost
faunus.graph.output.titan.storage.port=9160
faunus.graph.output.titan.storage.keyspace=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.graph.output.titan.ids.block-size=100000
faunus.graph.output.titan.storage.idauthority-wait-time=1000
# faunus.graph.output.titan.storage.connection-timeout=60000
# faunus.graph.output.titan.storage.cassandra.thrift.frame_size_mb=49
# faunus.graph.output.titan.storage.cassandra.thrift.max_message_size_mb=50
faunus.graph.output.titan.infer-schema=false
faunus.graph.output.blueprints.tx-commit=10000

mapred.map.tasks=12
mapred.reduce.tasks=12
mapred.map.child.java.opts=-Xmx2G
mapred.reduce.child.java.opts=-Xmx2G
mapred.job.reuse.jvm.num.tasks=-1
mapred.task.timeout=5400000

faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=output
faunus.output.location.overwrite=true

HTH,
Marko.

----------------------

java.lang.IllegalArgumentException: Could not instantiate implementation: com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager
	at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:268)
	at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:226)
	at com.thinkaurelius.titan.diskstorage.Backend.<init>(Backend.java:97)
	at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:406)
	at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:62)
	at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:40)
	at com.thinkaurelius.faunus.formats.titan.GraphFactory.generateGraph(GraphFactory.java:20)
	at com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.generateGraph(BlueprintsGraphOutputMapReduce.java:61)
	at com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce$Reduce.setup(BlueprintsGraphOutputMapReduce.java:159)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:257)
	... 16 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryStorageException: Temporary failure in storage backend
	at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:394)
	at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.<init>(AstyanaxStoreManager.java:164)
	... 21 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.NoAvailableHostsException: NoAvailableHostsException: [host=None(0.0.0.0):0, latency=0(0), attempts=0] No hosts to borrow from
	at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.<init>(RoundRobinExecuteWithFailover.java:31)
	at com.netflix.astyanax.connectionpool.impl.TokenAwareConnectionPoolImpl.newExecuteWithFailover(TokenAwareConnectionPoolImpl.java:74)
	at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:229)
	at com.netflix.astyanax.thrift.ThriftClusterImpl.executeSchemaChangeOperation(ThriftClusterImpl.java:131)
	at com.netflix.astyanax.thrift.ThriftClusterImpl.addKeyspace(ThriftClusterImpl.java:252)
	at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:389)
	... 22 more

Daniel Kuppitz

unread,
Apr 27, 2013, 1:45:26 PM4/27/13
to aureliu...@googlegroups.com, gremli...@googlegroups.com
Hm, that's really odd. I've stopped the 3 cassandra nodes and started a single node with default settings. With this setup everything works as expected and it's damn fast. However, it doesn't make me happy yet. I'll dig a bit deeper to see what's wrong with the local cassandra cluster that uses virtual IP adresses. Will post the results later.

Cheers,
Daniel

Daniel Kuppitz

unread,
Apr 27, 2013, 4:57:44 PM4/27/13
to aureliu...@googlegroups.com, gremli...@googlegroups.com
I continued with a single node and now it got really confusing. First I initialized my graph:

gremlin> g.makeType().name('type').unique(OUT).indexed(Vertex.class).dataType(String.class).makePropertyKey()
==>v[36028797018964170]
gremlin> g.makeType().name('domain').unique(BOTH).indexed(Vertex.class).dataType(String.class).makePropertyKey()
==>v[36028797018964178]
gremlin> requests = g.makeType().name('requests').unique(OUT).dataType(Long.class).makePropertyKey()
==>v[36028797018964186]
gremlin> g.makeType().name('tracks').primaryKey(requests).makeEdgeLabel()
==>v[36028797018964198]
gremlin> g.makeType().name('followed_by').primaryKey(requests).makeEdgeLabel()
==>v[36028797018964206]
gremlin> g.commit()
==>null

Then I successfully imported data with Faunus. And finally I wanted to query the imported data, but what's that?

gremlin> supernode = g.V('type','supernode').next()
8607 [main] WARN  com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [(v[36028797018963978]=supernode)]. For better performance, use indexes

The graph contains only one node with the type supernode, but the query never finished (at least not within several minutes). What's wrong here? I have an index for zhe property type and it's still there, but obiously not used.

gremlin> g.getType('type')
==>v[36028797018963978]

Cheers,
Daniel

Daniel Kuppitz

unread,
Apr 27, 2013, 5:22:52 PM4/27/13
to aureliu...@googlegroups.com, gremli...@googlegroups.com
Just realized that the type "type" is not the one that I created initially (initial: v[36028797018964170], current: v[36028797018963978]).
Seems like the Faunus import dropped everything before it imported the new data.... And here's the solution for everybody facing this problem:

When you initialize your graph before importing data, make sure that your Faunus properties file contains the following line:

titan.graph.output.infer-schema=false

IMO this should be the default.

Cheers,
Daniel

Marko A. Rodriguez

unread,
Apr 27, 2013, 10:33:04 PM4/27/13
to aureliu...@googlegroups.com, aureliu...@googlegroups.com, gremli...@googlegroups.com
You only have in edges and Faunus requires both edges for BlueprintsOutputMapReduce. Please see the edge-copy property (I'm not a computer right now). For example, look at bin/script-input.properties. This flag will transpose your graph for you in a MapReduce job. I also talk about that property in Script Format wiki page.

Good luck,

On Apr 27, 2013, at 5:29 PM, Daniel Kuppitz <daniel....@shoproach.com> wrote:

Next problem: Vertices are imported without a problem, but I don't get a single edge. Here's a simple GraphSON file to reproduce it:

{"type":"supernode","_id":0}
{"name":"d.co.uk","type":"site","_id":1,"_inE":[{"_label":"tracks","_id":11,"_outV":0}]}
{"name":"t.com","type":"site","_id":2,"_inE":[{"_label":"tracks","_id":12,"_outV":0}]}
{"name":"dt.de","type":"site","_id":3,"_inE":[{"_label":"tracks","_id":13,"_outV":0}]}
{"name":"w.com","type":"site","_id":4,"_inE":[{"_label":"tracks","_id":14,"_outV":0}]}
{"name":"dw.net","type":"site","_id":5,"_inE":[{"_label":"tracks","_id":15,"_outV":0}]}
{"name":"tw.net","type":"site","_id":6,"_inE":[{"_label":"tracks","_id":16,"_outV":0}]}
{"name":"dtw.com","type":"site","_id":7,"_inE":[{"_label":"tracks","_id":17,"_outV":0}]}
{"name":"x.co.uk","type":"site","_id":8,"_inE":[{"_label":"tracks","_id":18,"_outV":0}]}
{"name":"dx.net","type":"site","_id":9,"_inE":[{"_label":"tracks","_id":19,"_outV":0}]}
{"name":"tx.de","type":"site","_id":10,"_inE":[{"_label":"tracks","_id":20,"_outV":0}]}

I've also tried the Graph of Gods sample GraphSON and it works, but I really don't see a difference; my sample has even less edges than the Graph of Gods.

Cheers,
Daniel

Daniel Kuppitz

unread,
Apr 27, 2013, 7:29:37 PM4/27/13
to aureliu...@googlegroups.com, gremli...@googlegroups.com
Next problem: Vertices are imported without a problem, but I don't get a single edge. Here's a simple GraphSON file to reproduce it:

{"type":"supernode","_id":0}
{"name":"d.co.uk","type":"site","_id":1,"_inE":[{"_label":"tracks","_id":11,"_outV":0}]}
{"name":"t.com","type":"site","_id":2,"_inE":[{"_label":"tracks","_id":12,"_outV":0}]}
{"name":"dt.de","type":"site","_id":3,"_inE":[{"_label":"tracks","_id":13,"_outV":0}]}
{"name":"w.com","type":"site","_id":4,"_inE":[{"_label":"tracks","_id":14,"_outV":0}]}
{"name":"dw.net","type":"site","_id":5,"_inE":[{"_label":"tracks","_id":15,"_outV":0}]}
{"name":"tw.net","type":"site","_id":6,"_inE":[{"_label":"tracks","_id":16,"_outV":0}]}
{"name":"dtw.com","type":"site","_id":7,"_inE":[{"_label":"tracks","_id":17,"_outV":0}]}
{"name":"x.co.uk","type":"site","_id":8,"_inE":[{"_label":"tracks","_id":18,"_outV":0}]}
{"name":"dx.net","type":"site","_id":9,"_inE":[{"_label":"tracks","_id":19,"_outV":0}]}
{"name":"tx.de","type":"site","_id":10,"_inE":[{"_label":"tracks","_id":20,"_outV":0}]}

I've also tried the Graph of Gods sample GraphSON and it works, but I really don't see a difference; my sample has even less edges than the Graph of Gods.

Cheers,
Daniel

Am Samstag, 27. April 2013 23:22:52 UTC+2 schrieb Daniel Kuppitz:

Daniel Kuppitz

unread,
Apr 28, 2013, 4:29:31 AM4/28/13
to aureliu...@googlegroups.com, gremli...@googlegroups.com
Ok, adding

faunus.graph.input.edge-copy.direction=IN

to the configuration solved the problem. However, it would be great if I could also set

faunus.graph.input.edge-copy.direction=BOTH

so that I only have to have one direction in my input file.

Cheers,
Daniel
Reply all
Reply to author
Forward
0 new messages