I'm working on the storm-maven-plugin and have hit a wall related to how topologies are submitted to clusters (local and remote).
The usage scenarios for the plugin can be found here: https://github.com/ptgoetz/storm-maven-plugin
Currently the convention is to define a class (which I'll call a "driver") with a main() method that builds the topology and then decides whether to deploy to a local cluster or a remote cluster.
For the maven plugin, I was hoping to have separate goals for running locally ("mvn storm:run") or submitting to a cluster ("mvn storm:jar"). The problem here is that since the local vs. remote logic is tied up in the main() method, I don't have control over it.
Okay, so for a second let's forget about separate run/jar options and assume that the "driver" class always decides how to deploy the topology.
If we let driver class make the choice, I was thinking the plugin mojo could just call the driver class' main() method. I went down this path, but ran into 2 issues:
1. "StormSubmitter" gets the path to the jar file from the "STORM_JAR" environment variable. To get past that I either have to rely on the maven user setting the variable, or I have to use java.lang.ProcessBuilder to spawn a JVM to run the driver class' main() method (not very elegant, but should work).
2. "StormSubmitter" via "Utils.java" will pick up the nimbus host parameter from ~/.storm/storm.yaml, so to be able to make nimbus host configurable from the maven environment I would have to do something like: move ~/.storm/storm.yaml if it exists; write a temporary storm.yaml; run the deploy process; move the original back.
So as you can see (if I explained that well -- which I don't think I did ;) ) things start to fall apart and get pretty kludgy with the current model, and some of the features I have planned for the plugin start to drop. I'd really rather not have to muck with the user's environment if possible.
Ideally, I'd like to be able to use the StormSubmitter and NimbusClient classes directly, and use maven properties to build up the backtype.storm.Config object.
To do that I need some way to access the StormTopology instance for a class. Since, in the current model, the StormTopology instance is created in the main() method, I have no way of doing that.
So I guess what I'm requesting/proposing is something akin to hadoop's Tool/ToolRunner. So a "driver" class might look something like this ( TopologyDriver would be new interface that defines a buildTopology() method:
public class MyTopologyDriver implements TopologyDriver{
public StormTopology buildTopology(String[] args){
// build the topology…
return topology;
}
public static void main(String[] args){
TopologyDriver driver = new MyTopologyDriver();
StormTopology topology = driver.buildTopology(args);
Config conf = new Config();
StormSubmitter.submitTopology("my-topo", conf, topology);
}
}
That's just an example, not a thought out design, the idea is to expose the topology instance outside of a main() method so other tools can access it.
Hopefully that makes sense. :)
Thanks,
- Taylor
cheers,
Tom
I thought about this for awhile. Here's a question: what does it mean to choose between running a topology locally vs. remotely? When you submit a topology remotely, it runs forever. So what do you do in local mode? Do you just ctrl-c it when you're done watching it? What about cases like Distributed RPC, where testing it locally means setting up a LocalDRPC server and issuing DRPC requests to it, much more than just running a topology?
I have some potential solutions for #1 and #2. For #1, I can change how the storm jar is passed in by using a java property rather than an environment variable. So the jar would be passed in like this:java -Dstorm.jar={path to jar} ...Would this make it easier to integrate with Maven?
For #2, there are a few options. First of all, Storm doesn't explicitly look in ~/.storm for the storm.yaml file. The way Storm works is that it searches for storm.yaml on the classpath. The "storm" client adds ~/.storm to the classpath. So one solution to your problem would be to put the storm.yaml somewhere within the project and just add its directory to the classpath using Maven.Another solution I'm considering is creating the possibility to set configurations using java properties rather than requiring that nimbus.host be set in a storm.yaml file. So something like this:java -Dstorm:nimbus.host={location of nimbus} -Dstorm:nimbus.thrift.port={override the port here}Let me know your opinion between these two approaches.
As for having a ToolRunner type interface, I don't see anything that's
stopping you from doing that yourself. There's no reason why that
really has to be a first class concept for Storm. I prefer to provide
the libraries for creating topologies and submitting them, and then
you can manage that process yourself.
-Nathan