Questions about Verb tense features ad Parse Quality

Richard

unread,

Apr 18, 2011, 1:28:23 PM4/18/11

to link-grammar

Hi,

I'm a Master's student in NLP at the University of Ottawa, and I want
to use Relex for Verb Classification. For those who are interested,
I'll describe my work briefly at the end of this message, but I'd like
to start with my two questions.

First, Relex produces a tense feature tag when it comes across verbs.
From the OpenCog wiki (http://wiki.opencog.org/w/Word_properties), I
see that the possible values are {future, future progressive,
imperative, infinitive, past, past infinitive, past progressive,
perfect, present, present progressive, progressive}. When I examined
Relex's output, however, I noticed that <blank> is a possible value
(for cannot and don't), as are '.v' and '.v-d'. Also, many (possibly
all) of the tenses are followed by a pipe (|), and other tags, like
the aforementioned '.v' and '.v-d', and also 'polyword' or 'idiom'. Is
there a document somewhere which describes this tag in more detail?

Second, I know that Relex, or rather Link-Grammar, produces multiple
ranked parses of each sentence. I'm wondering how much variance in
quality there is between the top and bottom ranked parses. I'd like to
use more than just the first parse, but I'm worried about including
very unlikely parses which will just add noise to my data.

As promised, here is a short description of my work. I'm trying to put
verbs into general classes, like 'Activity', 'Event', 'Process',
'State', etc (I have 8 in total). The classes are mostly based on
Aspectual considerations, which is why I'm interested in verb tenses.
I want to use distributional analysis (using SuperMatrix) to try and
classify verbs, and I'm using Relex to extract the contexts that verbs
appear in. Contexts, in my case, are the relationships the verb
appears in, such as: subj:name, obj:band, to:portion, and so on.

Regards,
Richard

Ben Goertzel

unread,

Apr 18, 2011, 9:58:21 PM4/18/11

to link-g...@googlegroups.com

> Second, I know that Relex, or rather Link-Grammar, produces multiple
> ranked parses of each sentence. I'm wondering how much variance in
> quality there is between the top and bottom ranked parses. I'd like to
> use more than just the first parse, but I'm worried about including
> very unlikely parses which will just add noise to my data.

The top few parses will usually be enough to look at...

Ruiting Lian is currently working on statistical parse ranking for the
link parser; once that's done then it will help choose the best parse
from among the total set. This will be ready in weeks to months ;)
...

ben

Linas Vepstas

unread,

Apr 19, 2011, 11:49:15 AM4/19/11

to link-g...@googlegroups.com

Hi Richard,

On 18 April 2011 12:28, Richard <r.ke...@gmail.com> wrote:

> First, Relex produces a tense feature tag when it comes across verbs.
> From the OpenCog wiki (http://wiki.opencog.org/w/Word_properties), I
> see that the possible values are {future, future progressive,
> imperative, infinitive, past, past infinitive, past progressive,
> perfect, present, present progressive, progressive}.

There might be even more possible combinations. The tags are
generated algorithmically; I haven't tried to maintain an exhaustive
list, as some combinations are rare.

> When I examined
> Relex's output, however, I noticed that <blank> is a possible value

That's a bug. Provide a detailed bug report, I can try to fix this.

> as are '.v' and '.v-d'.

These are link-grammar "subscripts". They are used to organize
link-grammar dictionary entries, and *usually* correspond to parts
of speech (but not always! e.g. gerunds) A list of these can be
found at section 3.4 of
http://www.abisource.com/projects/link-grammar/dict/introduction.html#3
Note some verbs have w or q subscripts.

Relex is usually able to discern the tense of a verb based purely on
the link-grammar linkage; i.e. if link-this-and such is on the left and
whatever on the right, then its past progressive. The one tense
where relex fails is plain-old past tense; for these the v-d subscript
is used as a "hint" (later relex rules may over-ride the hint).

> Also, many (possibly
> all) of the tenses are followed by a pipe (|), and other tags, like
> the aforementioned '.v' and '.v-d', and also 'polyword' or 'idiom'. Is
> there a document somewhere which describes this tag in more detail?

You must be looking at the compact file format. Be aware that different
output formats represent these differently. The pipe is just
a separator between the various word features. These are listed here:
http://wiki.opencog.org/w/Word_properties
and http://wiki.opencog.org/w/Query_variables
http://wiki.opencog.org/w/HYP

See section "relations and features" in http://wiki.opencog.org/w/RelEx

> Second, I know that Relex, or rather Link-Grammar, produces multiple
> ranked parses of each sentence. I'm wondering how much variance in
> quality there is between the top and bottom ranked parses. I'd like to
> use more than just the first parse, but I'm worried about including
> very unlikely parses which will just add noise to my data.

The best parses are usually the top-ranked ones, but the overall
question of ranking, correctness etc is a difficult open problem.
I think its safe to use the first 2-4 or more. There's a float-pt
score in that file format; I think (but don't really know) that any
score within 30% of the highest score is probably a good parse.

> As promised, here is a short description of my work. I'm trying to put
> verbs into general classes, like 'Activity', 'Event', 'Process',
> 'State', etc (I have 8 in total). The classes are mostly based on
> Aspectual considerations, which is why I'm interested in verb tenses.
> I want to use distributional analysis (using SuperMatrix) to try and
> classify verbs, and I'm using Relex to extract the contexts that verbs
> appear in. Contexts, in my case, are the relationships the verb
> appears in, such as: subj:name, obj:band, to:portion, and so on.

Ohh! I like this! You realize, of course, that it might be possible
to get relex to spit out these classifications directly?

Let me describe how relex works. It takes, as input, a link-grammar
parse, and turns it into a graph. It then applies a sequence of rules
to transform the graph. For example: 'if this word has an S link to the
right, and its a noun, then it is the subject of a verb phrase.' Some
rules generate intermediate markup, which later rules use to make a
final determination.

Perhaps its possible to write rules to explicitly state contexts, and
add the verb-class tags?

--linas

Richard

unread,

May 6, 2011, 5:53:50 PM5/6/11

to link-grammar

Thank you for your reply, and apologies for the lateness of my own.

> That's a bug. Provide a detailed bug report, I can try to fix this.

>
> > As promised, here is a short description of my work. I'm trying to put
> > verbs into general classes, like 'Activity', 'Event', 'Process',
> > 'State', etc (I have 8 in total). The classes are mostly based on
> > Aspectual considerations, which is why I'm interested in verb tenses.
> > I want to use distributional analysis (using SuperMatrix) to try and
> > classify verbs, and I'm using Relex to extract the contexts that verbs
> > appear in. Contexts, in my case, are the relationships the verb
> > appears in, such as: subj:name, obj:band, to:portion, and so on.
>
> Ohh! I like this! You realize, of course, that it might be possible
> to get relex to spit out these classifications directly?
>
> Let me describe how relex works. It takes, as input, a link-grammar
> parse, and turns it into a graph. It then applies a sequence of rules
> to transform the graph. For example: 'if this word has an S link to the
> right, and its a noun, then it is the subject of a verb phrase.' Some
> rules generate intermediate markup, which later rules use to make a
> final determination.
>
> Perhaps its possible to write rules to explicitly state contexts, and
> add the verb-class tags?

It might be. One result of my thesis might be just such a set of
rules, but my hunch is that the rules required are sufficiently
complicated that it will take Machine Learning to tease them out. I
will keep you informed of my progress though.

Richard

Linas Vepstas

unread,

May 6, 2011, 7:34:20 PM5/6/11

to link-g...@googlegroups.com

On 6 May 2011 16:53, Richard <r.ke...@gmail.com> wrote:

>> That's a bug. Provide a detailed bug report, I can try to fix this.
>
> What details are you looking for?

the list you provided is good...

>
>> > As promised, here is a short description of my work. I'm trying to put
>> > verbs into general classes, like 'Activity', 'Event', 'Process',
>> > 'State', etc (I have 8 in total). The classes are mostly based on
>> > Aspectual considerations, which is why I'm interested in verb tenses.
>> > I want to use distributional analysis (using SuperMatrix) to try and
>> > classify verbs, and I'm using Relex to extract the contexts that verbs
>> > appear in. Contexts, in my case, are the relationships the verb
>> > appears in, such as: subj:name, obj:band, to:portion, and so on.
>>
>> Ohh! I like this! You realize, of course, that it might be possible
>> to get relex to spit out these classifications directly?
>>
>> Let me describe how relex works. It takes, as input, a link-grammar
>> parse, and turns it into a graph. It then applies a sequence of rules
>> to transform the graph. For example: 'if this word has an S link to the
>> right, and its a noun, then it is the subject of a verb phrase.' Some
>> rules generate intermediate markup, which later rules use to make a
>> final determination.
>>
>> Perhaps its possible to write rules to explicitly state contexts, and
>> add the verb-class tags?
>
> It might be. One result of my thesis might be just such a set of
> rules, but my hunch is that the rules required are sufficiently
> complicated that it will take Machine Learning to tease them out. I
> will keep you informed of my progress though.

Well, the general way of going about this is to find some simple
example that can be worked by hand, and then make sure that the
machine-learning algo can solve it as well.

In general, all of the various rules & etc. for parsing should be found
via machine learning; maintaining these by hand is a losing game,
in the long run. However, 'machine learning' in linguistics is easier said
than done...

--linas

Reply all

Reply to author

Forward