Designing API for a Markov Model

RJ Nowling

unread,

Sep 13, 2014, 10:28:10 PM9/13/14

to clo...@googlegroups.com

Hi all,

I'm new to Clojure and implementing a Markov Model as part of a larger project. I'd like some advice on the API for a progress-state function.

I see two possible options. In the first option, we always ask the user to provide and keep track of the MSM state themselves:

(progress-state markov-model previous-state) -> new-state

In the second approach, we create a record that combines a model and a current state:

(defrecord MarkovProcess [model current-state])

(progress-state markov-process) -> updated-markov-process, new-state

Which of these approaches is more idiomatic for Clojure? Are multiple return types an accepted practice in Clojure? Is there a third, better way?

Thanks in advance!

RJ

Jony Hudson

unread,

Sep 14, 2014, 6:00:20 AM9/14/14

to clo...@googlegroups.com

It's nice if the function returns the same sort of data as it consumes, because then it's easy to repeat it with `iterate` or `reduce`. So, if you take your first example, then you could write:

(take 100

(iterate (partial progress-state markov-model) initial-state))

to get the next 100 states.

If the process takes information at each step, e.g.:

(progress-state-with-input markov-model-2 previous-state current-input) -> new-state

then you can do a similar thing with reduce:

(take 100

(reductions (partial progress-state-with-input markov-model-2) initial-state inputs))

I'd prefer that to your second approach, as I don't think there's much reason to bundle the process and its state.

Another question to ponder is whether there should be a progress-state function, or whether the model itself could be a function. If the mechanics of the process are somewhat generic, and the `markov-model` is just data, then it's good as it is. But I'd make sure that progress-state isn't just an empty wrapper.

Jony

RJ Nowling

unread,

Sep 14, 2014, 11:18:30 AM9/14/14

to clo...@googlegroups.com

Thanks for the response!

You make a really good point about the first interface -- makes it easy to use with the built in functions.

The only things that really define the process is the model (a transition matrix) and the current state. The model doesn't change but the current state does. The next state is always chosen randomly based on probabilities that come from knowing the previous state.

My main argument for bundling the model and state together is that I don't want a user to pass in the wrong state or a state from a different process. Is there a Clojure opinion on these sort of things?

I was thinking a third approach could be to have:

(progress-state process1) -> process2

and

(get-state process2) -> state

so that progressing the state and getting the current state are decoupled.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/t7th1wY-Vos/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
em rnow...@gmail.com
c 954.496.2314

Christopher Small

unread,

Sep 14, 2014, 7:16:13 PM9/14/14

to clo...@googlegroups.com

A few questions out of curiosity, if you don't mind:

* Have you looked at existing MM libraries for Clojure?

* Is there something you need that other's don't currently offer/emphasize; or is this more of a learning project?

* Are you planning on or interested in open sourcing your work?

Best

Chris

RJ Nowling

unread,

Sep 14, 2014, 10:06:26 PM9/14/14

to clo...@googlegroups.com

Hi Chris,

I'm more than happy to answer questions.

General background for the project: My friend Jay Vyas initiated BigPetStore, a big data application blueprint for the Hadoop ecosystem centered around transaction data for a fictional chain of pet stores. BigPetStore is currently part of the Apache BigTop distribution.

I developed a much more advanced data generator that uses ab initio modeling of customer behavior to embed patterns complex enough for use in analytics. I developed the data generator in Python, made it available under the Apache 2.0 license, and currently have an associated conference paper under review:

https://github.com/rnowling/bigpetstore-data-generator/tree/branch-0.2

My next step is to rewrite the generator in a JVM language for integration with Hadoop and Spark and contribute it to BigTop. I'm very comfortable with Java but I'm also rewriting parts in Clojure and Scala to get a feel for whether they would make better fits. If the Clojure / Scala ports reach nearly complete status, I'll happy release them as well.

In general, I'm curious about the state of math modeling and machine learning libraries on the JVM. Incanter is nice but it seems to be missing Hidden Markov Models, Monte Carlo methods, numerical integrators for differential equations, and common machine learning methods.

I'm only using Markov models, not HMMs, though. MMs are simple enough that I can implement the functionality in less than 100 lines of code. However, if you know of a good library, I'm happy to take a look.

Thanks!

Quzanti

unread,

Sep 15, 2014, 5:18:04 AM9/15/14

to clo...@googlegroups.com

Is there a limited number of models?

The model should stay decoupled from the state as they are totally distinct

so general-fn[model old-state] -> new-state

then either you (if you know the model) or the user if they can choose any model should define a partial fn

partial specific-model-fn general-fn[model]

so that it is always clear which model is being used to idiot proof the situation, which was your worry

the partial fn can then be used with all the clojure goodies such as iteration and reduction as jony hudsun has pointed out

all I am adding to the debate is suggesting having a named partial fn named after the model you are using, for clarity

RJ Nowling

unread,

Sep 15, 2014, 9:43:40 AM9/15/14

to clo...@googlegroups.com

Consensus seems to be:

(progress-state model old-state) -> new-state

and use currying to create a closure around the model. Thanks!

--

You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/t7th1wY-Vos/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward