As far as I can see, the Paper by Dan Klein and Christopher Manning:
Klein, D. & Manning, C. D.
Corpus-Based Induction of Syntactic Structure: Models of Dependency
and Constituency
ACL 2004, 478-485
has already been discussed. I don't know if there is any interest in
the history of unsupervised grammar induction (UGI) but if there is,
let me know and I could also bring up some papers about older
approaches. Good results were also reported by:
R. Bod
Is the End of Supervised Parsing in Sight?
Proceedings ACL 2007, Prague, 400-407.
and:
Seginer, Y.
Fast Unsupervised Incremental Parsing
Proceedings of the 45th Annual Meeting of the Association of
Computational Linguistics, Association for Computational Linguistics,
2007, 384-391
the latter is interesting, because (at least compared to other
approaches) it is really extremly fast and works reasonably well
without clustering.
A Paper that reported good results and has a suprisingly simple
approach is:
Hänig, C.; Bordag, S. & Quasthoff, U.
UnsuParse: unsupervised Parsing with unsupervised Part of Speech
Tagging
Proceedings of the Sixth International Language Resources and
Evaluation (LREC'08), European Language Resources Association (ELRA),
2008
And I had to include this one because it leads to another topic of
interest, which would be unsupervised induction of POS-Tags. Most of
the approaches that I mentioned above perform considerably better,
when they work on tagged text instead of plain text. The last paper
reports that POS-Tags that were unsupervisedly induced perform just as
well (for their approach) as traditional (Stanford Tagger Style) POS-
Tags. This would mean, that it is possible to get decent linguistic
structure without the use of linguistic information other than the
text and the construction of the learning algorithm. If there is
interest, we could discuss the unsupervised induction of POS-Tags,
because it is a field that (to the best of my knowledge) made some
interesting advances in the last years, avoids the discussion about
the justification of POS by basing it simply on statistical
observations and it is useful for UGI.
To close this up:
Within the coming week, everybody can make up their mind, if they are
interested in this discussion. If you are, tell me what you would like
as a starting point. Options are: history of UGI (don't worry, I won't
go back to Harris), unsupervised POS Tagging, current UGI (one of the
papers mentioned above, name your favourite), Evaluation of UGI
When we have decided on a starting point and it's not one of the
papers above, I'll propose some papers and we can decide a few days
later. I'm willing to do the moderation for the first few papers and
then we'll see from there.
-Menaka
summary:
- presents an completly unsupervised approach
- uses the common cover link approach
- another way to express context free grammar but with added direction
- approach could as well work with CFGs
- all subtrees are assumed to be skewed, i.e. there is always one
short arm
- resulting parser is to be used incrementally
- does all possible reductions (or cover link introduction) whenever
new input is read
- after that reductions are only applied when a new word is read and
only such reductions involving that word
- at each training stage the current lexicon is used to create a parse
tree
- from these parse trees adjecency stats are read
- features extracted:
-- most common adjacent words
-- most common words appearing in the same positon (derived from words
that are adjacent)
-- is it possible to 'stop' on one side of the word (is the usually
interpunctuation or an end of sentence marker at that position)
-- based on the last one, should edges originate from or go to this
word
these statistics are then used again in parsing
Things I found interesting:
- there is no statistical reanalysation of the adjacency data, this
means that frequent words are important in the structuring
will this also work in languages that lack function words?
- there is also no clustering only the search for matching label pairs
might this 'smoothing' work better if the words were clustered by
their adjacency links?
- it is common the use unlabeled precision for evaluation and the
Klein Model (in it's simplest form) also has only one label for all
constituents, but if the structure is to be used in other algorithms
(especially machine learning) some labeling of the constituents may be
useful