Hi Chris,
I'm more than happy to answer questions.
General background for the project: My friend Jay Vyas initiated BigPetStore, a big data application blueprint for the Hadoop ecosystem centered around transaction data for a fictional chain of pet stores. BigPetStore is currently part of the Apache BigTop distribution.
I developed a much more advanced data generator that uses ab initio modeling of customer behavior to embed patterns complex enough for use in analytics. I developed the data generator in Python, made it available under the Apache 2.0 license, and currently have an associated conference paper under review:
My next step is to rewrite the generator in a JVM language for integration with Hadoop and Spark and contribute it to BigTop. I'm very comfortable with Java but I'm also rewriting parts in Clojure and Scala to get a feel for whether they would make better fits. If the Clojure / Scala ports reach nearly complete status, I'll happy release them as well.
In general, I'm curious about the state of math modeling and machine learning libraries on the JVM. Incanter is nice but it seems to be missing Hidden Markov Models, Monte Carlo methods, numerical integrators for differential equations, and common machine learning methods.
I'm only using Markov models, not HMMs, though. MMs are simple enough that I can implement the functionality in less than 100 lines of code. However, if you know of a good library, I'm happy to take a look.
Thanks!