I'm having trouble finding the right variable type to use in a spark cluster environment. I've trained a Factorie classifier on a local training set. I would like to apply it to a large set of data using spark. However, my class that extends LabeledCategoricalVariable won't deserialize since there is not a no-arg constructor for LabeledCategoricalVariable.
//example variable definition
object MyMatchDomain extends CategoricalDomain[String] with Serializable
class MyMatch(var id1: String, var id2: String, var label: String) extends LabeledCategoricalVariable[String](label) with Serializable {
override def domain: CategoricalDomain[String] = MyMatchDomain
def this() {
this("", "", "")
}
}
Serialization works fine when I extend CategoricalVariable, but I need the Labeled version for classification. Is there another variable type I should use instead? Or am I missing something?