Factorie for CRF / complex HMMs

44 views
Skip to first unread message

Peter Muhlberger

unread,
Dec 15, 2016, 3:37:36 PM12/15/16
to Factorie
I'm trying to determine whether Factorie can handle a project I'm undertaking and, therefore, whether to sink time into learning how to use Factorie.

I've been applying variations of hidden Markov models (HMMs) to text. These include adding multiple sequences, multiple emissions per hidden state, and may eventually want to run hierarchical HMMs, possibly with automatic discovery of the hierarchy structure (dynamic Bayesian networks -- DBNs), Dirichlet process priors, and so forth.  My problem is primarily inference.  Perhaps I should be looking at approaching the problem through CRFs, though I should note that I don't have vast amounts of data, and it's a bit hard to see what feature variables I could include that would be helpful.

I need software that could let me identify phrases, paragraphs, documents, and speakers and model each somewhat differently.  It's easy enough for me to, say, set up a R file with emissions (words) and other variables that identify the above.  And, in Stan for example, I can write the model so it takes into account these identifiers and adjusts the probability function in desired ways.  Unfortunately, I don't think Stan can handle the 10k+ parameters I'm likely to need in a real problem.  EM might be better, and it looks like Factorie has an EM facility, though it needs to be quite flexible.

Anyone have any thoughts on whether Factorie is a good way to approach the above?

Peter Schüller

unread,
Dec 16, 2016, 12:27:27 AM12/16/16
to dis...@factorie.cs.umass.edu
Hello Peter,

In 2014 I have done some work on German NER with CRF-like structures, see the figures in [1]. The result was, that plain CRF works best. If you want I can send you my code, but it is from 2014 and it might be that Factorie API changed since then.

Best,
Peter

[1] Peter Schüller. MoSTNER: Morphology-aware split-tag German NER with Factorie. In: KONVENS GermEval Shared Task on Named Entity Recognition, October 2014. http://www.peterschueller.com/pub/2014/2014_mostner_morphology_aware_split_tag_german_ner_with_factorie.pdf

--
--
Factorie Discuss group.
To post, email: dis...@factorie.cs.umass.edu
To unsubscribe, email: discuss+unsubscribe@factorie.cs.umass.edu

---
You received this message because you are subscribed to the Google Groups "Factorie" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@factorie.cs.umass.edu.



--
Peter SCHÜLLER, Assistant Professor Dr
Marmara University, Computer Engineering Department
http://www.peterschueller.com/

Peter Muhlberger

unread,
Dec 16, 2016, 10:20:05 AM12/16/16
to dis...@factorie.cs.umass.edu
Hi Peter:  Thanks so much for your response!  I'm about to travel for the holidays, so need to take a little time to look at what you sent me.  Would certainly like to see some worked out example of code with CRFs in Factorie.  

A concern for me is whether Factorie can handle roughly 12,000 parameters, which is my guess as to what I'll need for my eventual application.  Stan, which is in C++, can't do this in a reasonable amount of time on a workstation.  And I'd imagine Factorie's MCMC isn't as sophisticated as Stan's (which uses NUTS).  On the other hand, Stan's doing lots of extra work to marginalize across discrete variables (using the forward algorithm), which it can't handle directly.  Factorie's variational methods sound like they may be fast, but I wonder about accuracy.  EM is likely to be faster than MCMC and it looks like Factorie has an implementation of that, one I hope allows for a flexible model implementation.  Factorie does look exciting, but also seems it'll have a sharp learning curve.

Happy holidays!
Peter

Peter Schüller

unread,
Dec 19, 2016, 3:12:27 AM12/19/16
to dis...@factorie.cs.umass.edu, Peter Muhlberger
Hello Peter,

I have attached my code and the pom.xml that I used. Maybe it can be useful.

Happy holidays!
Peter

-- 
Peter SCHÜLLER, Assistant Professor Dr
Marmara University, Computer Engineering Department
http://www.peterschueller.com/
peter-schueller-germeval2014-factorie.tar.gz

Peter Muhlberger

unread,
Dec 19, 2016, 11:19:44 PM12/19/16
to dis...@factorie.cs.umass.edu
Hi Peter:  Thanks again!  

Have a wonderful holiday as well,
Peter
Reply all
Reply to author
Forward
0 new messages