I'm trying to determine whether Factorie can handle a project I'm undertaking and, therefore, whether to sink time into learning how to use Factorie.
I've been applying variations of hidden Markov models (HMMs) to text. These include adding multiple sequences, multiple emissions per hidden state, and may eventually want to run hierarchical HMMs, possibly with automatic discovery of the hierarchy structure (dynamic Bayesian networks -- DBNs), Dirichlet process priors, and so forth. My problem is primarily inference. Perhaps I should be looking at approaching the problem through CRFs, though I should note that I don't have vast amounts of data, and it's a bit hard to see what feature variables I could include that would be helpful.
I need software that could let me identify phrases, paragraphs, documents, and speakers and model each somewhat differently. It's easy enough for me to, say, set up a R file with emissions (words) and other variables that identify the above. And, in Stan for example, I can write the model so it takes into account these identifiers and adjusts the probability function in desired ways. Unfortunately, I don't think Stan can handle the 10k+ parameters I'm likely to need in a real problem. EM might be better, and it looks like Factorie has an EM facility, though it needs to be quite flexible.
Anyone have any thoughts on whether Factorie is a good way to approach the above?