Statistical parsing

Stuti Ajmani

unread,

Jun 3, 2011, 2:22:17 AM6/3/11

to link-grammar

Hi

Link grammar tends to fail when it comes to long sentences, making it
essential to do ranking of parses. So my current task is to do ranking
of parses in order to get correct results for very long sentences. The
parser now contains some experimental code for using corpus statistics
to provide a parse ranking, and to assign WordNet word senses to word,
based on their grammatical usage. I built the parser with this corpus
stats enabled but it's giving me the result for all parses, all
sentences, i.e. cost vector = (CORP=17 UNUSED=0 DIS=1 AND=0 LEN=5)
, CORP = 17 always. CORP should represent the importance (weight) of
the parse. I have downloaded the correct database for statistical
parsing from http://www.abisource.com/downloads/link-grammar/sense-dictionary/
. Can't really think why this problem is there. Need a little help.

Secondly corpus statistics of link grammar is based on word sense, but
there are many more algorithms such as svm based voting algorithm, the
grammatical trigram approach, context free grammar and word statistic,
etc. Which approach should be the best to implement?

Linas Vepstas

unread,

Jun 4, 2011, 7:46:39 PM6/4/11

to link-g...@googlegroups.com

On 3 June 2011 01:22, Stuti Ajmani <stuti...@iiitd.ac.in> wrote:
> Hi
>
> Link grammar tends to fail when it comes to long sentences, making it
> essential to do ranking of parses. So my current task is to do ranking
> of parses in order to get correct results for very long sentences. The
> parser now contains some experimental code for using corpus statistics
> to provide a parse ranking, and to assign WordNet word senses to word,
> based on their grammatical usage. I built the parser with this corpus
> stats enabled but it's giving me the result for all parses, all
> sentences, i.e. cost vector = (CORP=17 UNUSED=0 DIS=1 AND=0 LEN=5)
> , CORP = 17 always. CORP should represent the importance (weight) of
> the parse. I have downloaded the correct database for statistical
> parsing from http://www.abisource.com/downloads/link-grammar/sense-dictionary/
> . Can't really think why this problem is there. Need a little help.

When you start the parser, it should print something similar to

"Info: Corpus statistics database found at
/usr/local/share/link-grammar/sql/disjuncts.db"

otherwise, it will print warnings .. do you see either message?

My guess is that it failed to find or open the database; the value
17 is the 'worst possible ranking' . Its log-2 of a probability, so
2^-17 is a very small number.

> Secondly corpus statistics of link grammar is based on word sense, but
> there are many more algorithms such as svm based voting algorithm, the
> grammatical trigram approach, context free grammar and word statistic,
> etc. Which approach should be the best to implement?

That, I don't know; I can't say that I've seen many (any?) papers
comparing ranking algorithms; I haven't studied the problem.

--linas

Ben Goertzel

unread,

Jun 4, 2011, 11:25:27 PM6/4/11

to link-g...@googlegroups.com

To do really great parse ranking you'll need a corpus of some parsed sentences annotated with the correct parses. Then you can use a supervised learning approach...

-- Ben G

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To post to this group, send email to link-g...@googlegroups.com.
To unsubscribe from this group, send email to link-grammar...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/link-grammar?hl=en.

--
Ben Goertzel, PhD
CEO, Novamente LLC and Biomind LLC
CTO, Genescient Corp
Chairman, Humanity+
Adjunct Professor of Cognitive Science, Xiamen University, China
Advisor, Singularity University and Singularity Institute
b...@goertzel.org

"My humanity is a constant self-overcoming" -- Friedrich Nietzsche

Stuti Ajmani

unread,

Jun 6, 2011, 4:59:54 AM6/6/11

to link-grammar

@Lenas
Thanks for the reply.
I am getting the following message
"Info: Corpus statistics database found.

Still the error prevails.

On Jun 5, 4:46 am, Linas Vepstas <linasveps...@gmail.com> wrote:

> On 3 June 2011 01:22, Stuti Ajmani <stuti08...@iiitd.ac.in> wrote:
>
> > Hi
>
> > Link grammar tends to fail when it comes to long sentences, making it
> > essential to do ranking of parses. So my current task is to do ranking
> > of parses in order to get correct results for very long sentences. The
> > parser now contains some experimental code for using corpus statistics
> > to provide a parse ranking, and to assign WordNet word senses to word,
> > based on their grammatical usage. I built the parser with this corpus
> > stats enabled but it's giving me the result for all parses, all
> > sentences, i.e. cost vector = (CORP=17 UNUSED=0 DIS=1 AND=0 LEN=5)
> > , CORP = 17 always. CORP should represent the importance (weight) of
> > the parse. I have downloaded the correct database for statistical

> > parsing fromhttp://www.abisource.com/downloads/link-grammar/sense-dictionary/

Linas Vepstas

unread,

Jun 6, 2011, 11:17:38 AM6/6/11

to link-g...@googlegroups.com

On 6 June 2011 03:59, Stuti Ajmani <stuti...@iiitd.ac.in> wrote:
> @Lenas
> Thanks for the reply.
> I am getting the following message
> "Info: Corpus statistics database found.
>
> Still the error prevails.

And no other warnings or errors are printed?

I've attached a debug copy of corpus.c to this email; please copy it to
link-grammar/corpus/corpus.c and recompile and reinstall. Run it and then
send me *all* the output.