We're using Factorie to create a CRF for query understanding, to both segment
words into search phrases and label them in various categories. So we have
something like the following:

sealed trait Label
case object Name extends Label
case object Company extends Label
case object Skill extends Label
case object Location extends Label

sealed trait Position
case object Begin extends Position
case object Internal extends Position

The CRF's hidden markov states are each a (Label, Position) pair. The Label
is a category label for a phrase, and the Position is for labelling
phrase boundaries because a phrase boundary always preceeds a Begin Position
and nothing else.

Using Factorie we model this like this, pretty much just following the linear
chain CRF examples in the tutorials:

class Query extends Chain[Query,Token]

class Token(val word:String, label: Label, pos: Postion))
extends FeatureVectorVariable[String] with ChainLink[Token,Query] {
//boilerplate like the example crf code

object CRFLabelDomain extends CategoricalDomain[(Label, Position)]
class CRFLabel(label: Label, pos: Position, val token: Token)
extends LabeledCategoricalVariable((label, pos)) {

class FactorieCRF(val tokenDomain: CategoricalVectorDomain[String])
extends TemplateModel with Parameters {

object transition extends DotTemplateWithStatistics2[CRFLabel, CRFLabel] {
    val weights
= Weights(new la.DenseTensor2(CRFLabelDomain.size,
def unroll1(label: CRFLabel): Iterable[Factor] =
if (label.hasPrev) Factor(label.prev, label) else Nil
def unroll2(label: CRFLabel): Iterable[Factor] =
if (label.hasNext) Factor(label, else Nil
object evidence extends DotTemplateWithStatistics2[CRFLabel, Token] {
    val weights
= Weights(new la.DenseTensor2(CRFLabelDomain.size,
def unroll1(label: CRFLabel): Iterable[Factor] = Factor(label, label.token)
def unroll2(token: Token): Iterable[Factor] =
throw new Error("Token values shouldn't change")
this += evidence
this += transition

There are two ways I want to amend this model.

1) Rule out illegal state transitions. A transition from Name, Internal
to Skill, Internal is illegal. A transition to any Internal state can only
come from a previous state with an identical Label.

I tried setting the illegal transition weights to -∞, but caused learning to
crash. It seems like those transitions just should not be parameters, so I
think I should be able to do something like the following (with unroll2
elided and label.prev expressed as an Option for clarity):

object transition extends Template2[CRFLabel, CRFLabel] with ??? {
  val weights
= //some approprately sized tensor
def unroll1(label: CRFLabel): Iterable[Factor] = (label.prev, label) match {
case (None, _) => Nil
case (Some(fst), snd@(_, Begin)) => Factor(weights((fst, snd)))
case (Some(fst@(l1, _)), snd@(l2, Internal)) => if(l1 == l2)
Factor(weigts(fst, snd)) else Factor(Double.NegativeInfinity)

Is this possible? I'm not sure how to correctly express this with Factorie.
This definitely doesn't seem like the right use of Family2.Factor. Instead
it looks like I should provide a score definition directly somehow, but
something about how to proceed here eludes me.

2) Because of limitations of our training data, we really can't estimate
transitions between phrases within a query. We can really only estimate
approximate distributions of likely segment lengths. So it seems reasonable to
only have parameters for (Name, Begin) -> (Name, Internal) and
(Name, Internal) -> (Name, Internal) transitions and replace the second
case in my pseudo scala above with:

   case (Some(fst), snd@(_, Begin)) => 0.0

and have a transition parameter tensor with only 2*|Labels| weights.

Any guidance as to how I can do this?

Thank you!

