Added:
/wiki/hadoop.wiki
=======================================
--- /dev/null
+++ /wiki/hadoop.wiki Tue Jan 24 04:46:36 2012
@@ -0,0 +1,28 @@
+#summary Remote hadoop job deployment
+
+= Introduction =
+
+Virgil allows you to deploy Hadoop jobs to a remote cluster. This page
describes how.
+
+= Details =
+
+Similar to the different run-modes for Cassandra, Virgil supports two
different modes for Hadoop. When started with "bin/virgil", Hadoop jobs
are run locally within the Virgil JVM. This is convenient for testing, but
when executing against a large dataset, the jobs should be deployed to a
remote cluster.
+
+To deploy jobs to a remote cluster, edit the configuration in
$VIRGIL_HOME/mapreduce/conf. These are the exact three files found in
$HADOOP_HOME/conf. If Virgil and Hadoop are running on the same machine,
then you can simply symlink to the files.
+
+After changing the configuration, start Virgil with:
+
+{{{
+bin/virgil-hadoop -host $CASSANDRA_HOST
+}}}
+
+This version of the shell script, puts the HADOOP configuration on
Virgil's classpath. Hadoop reads the configuration off of the classpath,
and will remotely deploy the job to the cluster described by those
configuration files.
+
+To test your setup, the Virgil release now includes an example in
$VIRGIL_HOME/example. Within that directory, you should be able to run the
following:
+
+{{{
+./insert-data.sh
+./run-mapreduce.sh
+}}}
+
+That runs the example described on the