I've submitted my system for the Dependency induction task.
Unfortunately, my data exceeds Google Groups 4MB limit, because I've
made the predictions for all types of pos tags (CPOS, POS, UPOS). So
I've sent my predictions only to Douwe.
Here is the system description:
Our approach is based on dependency model that consists of three
submodels (i) edge model (similar to P_CHOOSE in DMV), (ii) fertility
model (modeling number of children for a given head), and (iii)
reducibility model. Fertility model utilizes the observation that
fertility of function words (typically the most frequent words in the
corpus) is more determined than fertility of content (less frequent)
words. The reducibility is a feature of individual part-of-speech
tags. We compute it based on reducibility of words in a large
unannotated corpus (we used Wikipedia aticles). A word is reducible,
if the sentence after removing the word remains grammatically correct.
The grammaticality of such newly created sentences is tested by
searching for it in the corpus.
The inference itself was done on the test corpus using Gibbs sampling
method. Three hyperparameters were tuned on English Penn Treebank with
the fine-grained POS tags (5th column in the given CoNLL format). For
parsing other languages and for all types of tags (CPOS, POS, UPOS),
we used the same parameter setting.
The only additional data (not provided by organizers) are the
unannotated monolingual Wikipedia text. (They were automatically POS
tagged by TnT tagger trained on the provided corpora.)
Best,
David Marecek