Hey Anthony,
I've had very good luck with pacer and Titan/Cassandra (in fact, I've monkeyed around with all the back end options except hadoop, plus a recently introduced datastore called persistit). The coupling you're concerned about here is between pacer and Titan, and that works great. I have been super happy with it. You can get started by doing something like this:
config = org.apache.commons.configuration.MapConfiguration.new(
"storage.backend" => "embeddedcassandra",
"storage.cassandra-config-dir" => "file://#{File.absolute_path(File.dirname(__FILE__))}/config/environments/#{ENV['RACK_ENV']}/cassandra.yaml",
"storage.index.search.backend" => "elasticsearch",
"storage.index.search.directory" => "/tmp/searchindex",
"storage.index.search.client-only" => "false",
"storage.index.search.local-mode" => "true"
)
G = Pacer.titan config
The code above references a Pacer module that looks like this:
require "java"
require "pacer"
require "titan/titan_persistent_object"
module Pacer
class << self
attr_reader :titan_g
def open(path)
proc do
graph = Pacer.open_graphs[path]
unless graph
args = [org.apache.commons.configuration.Configuration.java_class]
@titan_g = graph = com.thinkaurelius.titan.core.TitanFactory.java_send(:open, args, @config)
end
graph
end
end
def shutdown(path)
proc do |graph|
graph.blueprints_graph.shutdown
Pacer.open_graphs.delete path
end
end
def titan(config, path="foo")
@config = config
PacerGraph.new(Pacer::SimpleEncoder, open(path), shutdown(path))
end
private :open, :shutdown
end
end
This assumes you're running Cassandra embedded, which used to be the recommended method, but now they (the Titan folks) are recommending running Cassandra as a separate process (and are likely to drop Cassandra embedded in a near future release). So you'll need to tweak that config just a hair, but the documentation for that on the Titan wiki is pretty good.
There is one big gotcha that bit me and cost me a fair amount of time with Cassandra/Titan specifically. You'll be running with JRuby, and when running with a back end data store like Berkeley DB, I was able to do something like this within my ruby source to load all the dependencies dynamically at runtime:
Dir["#{Main.root}/titan-berkeleyje-0.3.0/lib/\*.jar"].each { |jar| require jar }
For reasons I can't really be sure of, that will *not* work with cassandra. My best theory is that Cassandra has dependencies on some jars with native components inside them, and those pieces don't load/execute correctly when classloaded as above (it also make things a pain if you're trying to run it inside a container like JBoss). At any rate, Cassandra complains in all kinds of ways unless you're very explicit about setting up your classpath at launch time. I ended up creating a script to launch my application that looks something like:
#!/bin/sh
java -Xmx3072m -XX:MaxPermSize=2048m -classpath \
lib/java/akiban-persistit-3.3.0.jar:\
lib/java/blueprints-core-2.4.0.jar:\
lib/java/titan-core-0.4.0-SNAPSHOT.jar:\
lib/java/titan-persistit-0.4.0-SNAPSHOT.jar:\
...
lib/java/titan-cassandra-0.3.1.jar:\
lib/java/titan-core-0.3.1.jar org.jruby.Main -S lib/ruby/main.rb "$@"
Of course, if you're packaging your project in its own runnable jar, you can hide all of this in your project's jar's manifest, but when developing, I wanted my code exploded out as uncompiled ruby source, and I wanted ease of launching it, which this script does.
I hope this helps you avoid some of the pains I went through. I also subscribed to the titan-users mailing list, which is very active, and like this list, full of helpful people.
Good luck!
Mark