Naive question about dictionary

2 views
Skip to first unread message

Paul McQuesten

unread,
May 20, 2009, 3:59:59 PM5/20/09
to link-grammar
Apologies in advance if this is spam, or if there is a more
appropriate forum. Also, I am asking about languages issues that I
know nothing of, so feel free to ignore this.

The sentence
"Alice ate the mushroom."
gets two parses, of equal confidence, that differ only in that one has
'mushroom.s' and the other has 'mushroom.p'.

The latter word is present in link grammar data/en/words/words.n.4. If
I understand correctly, that means that 'mushroom' can be a mass noun.
I have some questions:

1) The Alice sentence tempts me to say that a rule should be written
so that a determiner makes 'mushroom' singular.

2) Is it a mass noun in order to allow, eg, 'Consider the mushroom.'
or 'The mushroom is a delicacy to X.' ??

3) Who maintains the Link Grammar dictionary? Is there a forum? A
place to report (suspected) bugs? A place to learn more?

TIA
~Paul

Linas Vepstas

unread,
May 20, 2009, 4:25:43 PM5/20/09
to link-g...@googlegroups.com
Hi Paul,

2009/5/20 Paul McQuesten <mcqu...@gmail.com>:


>
> Apologies in advance if this is spam, or if there is a more
> appropriate forum. Also, I am asking about languages issues that I
> know nothing of, so feel free to ignore this.
>
> The sentence
>   "Alice ate the mushroom."
> gets two parses, of equal confidence, that differ only in that one has
> 'mushroom.s' and the other has 'mushroom.p'.

Sorry, yes, some of this kind of stuff probably
should be cleaned up.

> The latter word is present in link grammar data/en/words/words.n.4. If
> I understand correctly, that means that 'mushroom' can be a mass noun.
> I have some questions:
>
> 1) The Alice sentence tempts me to say that a rule should be written
> so that a determiner makes 'mushroom' singular.

"Alice stepped on the hot sand"

> 2) Is it a mass noun in order to allow, eg, 'Consider the mushroom.'
> or 'The mushroom is a delicacy to X.' ??

"Alice tripped, fell, and ate the sand."

> 3) Who maintains the Link Grammar dictionary?

Mostly just me. I've gotten updates/changes/suggestions
from about a half-dozen contributors over the last year
or so.

> Is there a forum?

This mailing list is it.

> A
> place to report (suspected) bugs?

You should probably mention them by email.

You can report bugs at the google-code bug tracker
at http://code.google.com/p/link-parser/issues/list?ts=1201707275

I've got a collection of failing (and good) parses in
the data/en/*batch files.

but they will likely be ignored for some indefinite
period of time -- the core problem is that I'm aware
of a lot of them, and fixing just one bug can take
hours or days. Fixing even the simplest bugs takes
a fair amount of linguistic sophistication (i.e. being
on your toes, and testing other, related uses).

I have some vague hopes of using automated
methods to improve the quality and coverage of
the parser, but at the current rate of progress,
strong results are years away ... (although I've
been able to make some minor improvements in
a semi-automated way).

> A place to learn more?

There is actually a *lot* of documentation for
link-grammar, which most people don't read...
but sooner or later, bug fixing would get you
quite acquainted with it.

--linas

Paul McQuesten

unread,
May 21, 2009, 1:54:36 AM5/21/09
to link-grammar
> > The latter word is present in link grammar data/en/words/words.n.4. If
> > I understand correctly, that means that 'mushroom' can be a mass noun.
> > I have some questions:

> > 1) The Alice sentence tempts me to say that a rule should be written
> > so that a determiner makes 'mushroom' singular.

> "Alice stepped on the hot sand"
But "Alice stepped on the mushroom" is singular.
But I understand that any such rule would have to affect only some
words.

> > 2) Is it a mass noun in order to allow, eg, 'Consider the mushroom.'
> > or 'The mushroom is a delicacy to X.' ??

> "Alice tripped, fell, and ate the sand."
"Alice tripped, fell, and ate the mushroom."--singular
So are you agreeing that 'mushroom.p' should go away?

> > 3) Who maintains the Link Grammar dictionary?
>
> Mostly just me.  I've gotten updates/changes/suggestions
> from about a half-dozen contributors over the last year
> or so.
And thank you for your efforts. Sincerely.

> ... a lot of them, and fixing just one bug can take
> hours or days.   Fixing even the simplest bugs takes
> a  fair amount of linguistic sophistication (i.e. being
> on your toes, and testing other, related uses).
Amen. I know that I do not have that sophistication, so I am glad that
there are you folks who do. My question was not because I really need
mushrooms to work well, but more to check my understanding.

> There is actually a *lot* of documentation for link-grammar, which most people don't read...
> but sooner or later, bug fixing would get you quite acquainted with it.
I am certainly not qualified to mess with any of the linguistic
processes/data. I was just trying to figure out how to use 'inflection-
TAG'. An hour of googling did not do it for me, so if you have a link
that would be nice. I finally looked in the code and found the switch
statement in LinkableView.java, setWordAndPos. My question was mostly
to see if I was reading it correctly. OTOH, [cdikq] are marked as
questionable in there. Where would I go to find out whether I should
care?

Thanks a bunch for taking the time for such an informative reply.
~Paul

Linas Vepstas

unread,
May 21, 2009, 3:15:13 PM5/21/09
to link-g...@googlegroups.com
Hi Paul,

2009/5/21 Paul McQuesten <mcqu...@gmail.com>:


>
>> > The latter word is present in link grammar data/en/words/words.n.4. If
>> > I understand correctly, that means that 'mushroom' can be a mass noun.
>> > I have some questions:
>
>> > 1) The Alice sentence tempts me to say that a rule should be written
>> > so that a determiner makes 'mushroom' singular.
>
>> "Alice stepped on the hot sand"
> But "Alice stepped on the mushroom" is singular.

Well, you implied that the determiner "the" was enough
to imply that the word that followed was singular -- clearly
its not.

From the purely syntactical point of view, we cannot tell,
based on syntax alone, if "mushroom" is mass or count.
Just think: "Alice stepped on the flarbleblob" -- is
flarbleblob used as a mass or count noun?

>> > 2) Is it a mass noun in order to allow, eg, 'Consider the mushroom.'
>> > or 'The mushroom is a delicacy to X.' ??
>
>> "Alice tripped, fell, and ate the sand."
> "Alice tripped, fell, and ate the mushroom."--singular
> So are you agreeing that 'mushroom.p' should go away?

well, probably it would be mushroom.s that should go
away, since mushroom.p is probably (I'd need to check)
usable as both mass and count, not just count. For
example:

"I ate pizza with mushroom" (no s, so not plural), which
is a valid use of the mass noun mushroom, just like
"I ate pizza with spinach"

I'll look at this a bit later today.

>> ... a lot of them, and fixing just one bug can take
>> hours or days.   Fixing even the simplest bugs takes
>> a  fair amount of linguistic sophistication (i.e. being
>> on your toes, and testing other, related uses).
> Amen. I know that I do not have that sophistication,

The basics are not that hard, for native english speakers.
Mostly you have to try a large number of similar,
grammatical sentences, and also try many similar,
ungrammatical ones

>> There is actually a *lot* of documentation for  link-grammar, which most people don't read...
>> but sooner or later, bug fixing would get you quite acquainted with it.
> I am certainly not qualified to mess with any of the linguistic
> processes/data. I was just trying to figure out how to use 'inflection-
> TAG'.

A bit misnamed, I should have called it "subscript-TAG"
See section 3.4 of
http://www.abisource.com/projects/link-grammar/dict/introduction.html#3

> [cdikq] are marked as
> questionable in there. Where would I go to find out whether I should
> care?

The subscripts/"inflections" are not "carved in stone".
The use of .n.v, .a and .e are standardized and guaranteed,
but some of the others are not. I think that .x, .y and .u
should be stable, as would .b, .c ,f,g , l m . Maybe also
.s, .p and .t But the dictionaries do contain some "wild"
usages of some of the subscripts that might break some
of these categorizations. (some of the "not used"
subscripts might actually be used.)

The subscripts can be multi-letter, and I've been toying
with the idea of extending them to multi-letter values,
for various refined uses.

--linas

Linas Vepstas

unread,
May 21, 2009, 3:48:57 PM5/21/09
to link-g...@googlegroups.com
2009/5/21 Linas Vepstas <linasv...@gmail.com>:

> Hi Paul,
>
> 2009/5/21 Paul McQuesten <mcqu...@gmail.com>:
>>
>>> > The latter word is present in link grammar data/en/words/words.n.4. If
>>> > I understand correctly, that means that 'mushroom' can be a mass noun.
>>> > I have some questions:
>>
>>> > 1) The Alice sentence tempts me to say that a rule should be written
>>> > so that a determiner makes 'mushroom' singular.
>>
>>> "Alice stepped on the hot sand"
>> But "Alice stepped on the mushroom" is singular.
>
> Well, you implied that the determiner "the" was enough
> to imply that the word that followed was singular -- clearly
> its not.
>
> From the purely syntactical point of view, we cannot tell,
> based on syntax alone, if "mushroom" is mass or count.
> Just think: "Alice stepped on the flarbleblob" -- is
> flarbleblob used as a mass or count noun?

I guess that what you are saying is that "if a noun
has both mass and count forms, *and* it is used
with a determiner, then is should be understood to
be the count form. That sounds right. Yes, the dicts
could be modified to handle this case.

In fact, there's a fair amount of semantic information
that can be deduced from syntax, and the rules could
be extended to reflect that. The original authors noted
this in their first paper... and so its stayed, at this level.

FWIW, I have been playing with automatic, high-speed
word-sense disambiguation, using wordnet to tag
certain grammatical constructions with word-senses.
I'm doing this by accumulating a lot of statistics .. I think
the technique works well, but am still evaluating accuracy.

(in principle, one could tag from other ontologies or
triple-stores or w3c semantic-net OWL datbases or
even cyc labels ... just that wordnet was the most
convenient).

--linas

Paul McQuesten

unread,
May 22, 2009, 9:57:41 PM5/22/09
to link-grammar

On May 21, 12:48 pm, Linas Vepstas <linasveps...@gmail.com> wrote:
> 2009/5/21 Linas Vepstas <linasveps...@gmail.com>:

> > 2009/5/21 Paul McQuesten <mcques...@gmail.com>:
>
> >>> ... that means that 'mushroom' can be a mass noun.
> >>> 1) The Alice sentence tempts me to say that a rule should be written
> >>> so that a determiner makes 'mushroom' singular.
>
> >> "Alice stepped on the hot sand"
> > But "Alice stepped on the mushroom" is singular.
>
> Well, you implied that the determiner "the" was enough
> to imply that the word that followed was singular -- clearly its not.
>
> From the purely syntactical point of view, we cannot tell,
> based on syntax alone, if "mushroom" is mass or count.
> Just think: "Alice stepped on the flarbleblob" -- is
> flarbleblob used as a mass or count noun?

> "I ate pizza with mushroom" (no s, so not plural), which
> is a valid use of the mass noun mushroom, just like
> "I ate pizza with spinach"

But mass-only nouns don't go the other way:
"I ate pizza with the mushroom."
*"I ate pizza with the spinach."
"I had pizza with the best spinach." adjective makes it okay?
"I had mushrooms with the spinach." okay because it is shorthand
for
"I had mushrooms with the spinach dish."?

> I guess that what you are saying is that "if a noun
> has both mass and count forms, *and* it is used
> with a determiner, then is should be understood to
> be the count form.   That sounds right. Yes, the dicts
> could be modified to handle this case.

That sounds great, but I was actually not near figuring it out. I just
meant that the word 'mushroom' should be so treated, leaving an
exercise for the reader to discover other such words. Sounds like you
discovered all of them!

...
> There is actually a *lot* of documentation for link-grammar, which most people don't read...
> but sooner or later, bug fixing would get you quite acquainted with it.

I can only plead partial brain-death. I found the subscript
documentation on a page that I had been skipping in my search because
I thought I knew what was on it. Apologies.

>The subscripts/"inflections" are not "carved in stone".
>The use of .n.v, .a and .e are standardized and guaranteed...

My original query was ill-founded. I was looking at ReLex output
trying to figure out what to use each piece for, in particular the
inflection-TAG. It is now clear that the subscripts are an internal
mechanism of LG, and one of the ReLex options is to output them
(possibly for debugging) with no recommendation that they be used.

> In fact, there's a fair amount of semantic information
> that can be deduced from syntax, and the rules could
> be extended to reflect that.  The original authors noted
> this in their first paper... and so its stayed, at this level.

> FWIW, I have been playing with automatic, high-speed
> word-sense disambiguation, using wordnet to tag
> certain grammatical constructions with word-senses.
> I'm doing this by accumulating a lot of statistics .. I think
> the technique works well, but am still evaluating accuracy.
>
> (in principle, one could tag from other ontologies or
> triple-stores or w3c semantic-net OWL datbases or
> even cyc labels ... just that wordnet was the most
> convenient).

Can you elaborate on "certain grammatical constructions"? What kind of
information does WordNet give you that the LG dictionaries are
missing? I have been messing with Yago (http://www.mpi-inf.mpg.de/yago-
naga/yago/), so would like to understand more..

~Paul

Linas Vepstas

unread,
May 22, 2009, 10:54:59 PM5/22/09
to link-g...@googlegroups.com
2009/5/22 Paul McQuesten <mcqu...@gmail.com>:

>
> But mass-only nouns don't go the other way:

That depends

>  "I ate pizza with the mushroom."
>  *"I ate pizza with the spinach."

Yes, a more clever parser might recognize this as ungrammatical, while
still allowing

"I ate the pizza with the spinach on it".

>  "I had pizza with the best spinach."       adjective makes it okay?
>  "I had mushrooms with the spinach."    okay because it is shorthand
> for
>  "I had mushrooms with the spinach dish."?

Exactly what is considered to be grammatical,
and what's not, can be quite gray, and highly
speaker-dependent. People who design NLP
parsers want to prove that they've accomplished
something by rejecting ungrammatical sentences.
Poeple who want to use NLP parsers as grammar
checkers prefer strict parsers.

People who do AI work tend to want forgiving parsers,
so that they can handle medical literature, doctor's
notes, english-as-second-language speakers,
IRC chat logs, etc.

There's no particularly easy way to accomodate
both camps, although various scoring mechanisms
might be able to fill in the gray areas.

>>The subscripts/"inflections" are not "carved in stone".
>>The use of .n.v, .a and .e are standardized and guaranteed...
>
> My original query was ill-founded. I was looking at ReLex output
> trying to figure out what to use each piece for, in particular the
> inflection-TAG. It is now clear that the subscripts are an internal
> mechanism of LG, and one of the ReLex options is to output them
> (possibly for debugging) with no recommendation that they be used.

I wouldn't recommend using them for anything critical.
But they are useful, and I won't discourage their use
either. Just caveat emptor.

>> In fact, there's a fair amount of semantic information
>> that can be deduced from syntax, and the rules could
>> be extended to reflect that.  The original authors noted
>> this in their first paper... and so its stayed, at this level.
>
>> FWIW, I have been playing with automatic, high-speed
>> word-sense disambiguation, using wordnet to tag
>> certain grammatical constructions with word-senses.
>> I'm doing this by accumulating a lot of statistics .. I think
>> the technique works well, but am still evaluating accuracy.
>>
>> (in principle, one could tag from other ontologies or
>> triple-stores or w3c semantic-net OWL datbases or
>> even cyc labels ... just that wordnet was the most
>> convenient).
>
> Can you elaborate on "certain grammatical constructions"?

Ah, for example:

suffers.v which has link-grammar disjunct
Ss- Os+ MVp+ which correlates very highly with
wordnet sense get%2:29:11:: aka suffer%2:29:01::

This is because sentences like
"She suffered a fracture in the accident"
"He suffered an insulin shock after eating three candy bars"

all link suffer with Ss- Os+ MVp+

This is clearly a different meaning than:
The new secretary had to suffer a lot of unprofessional remarks (which
is I- Op+)
She was suffering after the accident,
She is suffering from the hot weather
She suffers from a tendency to talk too much (which is Ss- MVp+ MVp+,
or MVa+, MVi+, etc.)
She suffered a terrible fate. (Ss- Os+)
This author really suffers in translation (Ss- MVp+)

I blog more about this at

http://brainwave.opencog.org/2009/01/12/determining-word-senses-from-grammatical-usag

and plan to turn it into a real academic paper
someday when I get the time.

> What kind of
> information does WordNet give you that the LG dictionaries are
> missing?

The two dictionaries are very different, and serve
different purposes. The link-grammar dictionaries
are more or less insensitive to semantics, and deal
only with the manner in which words are allowed to
syntactically arrange.

Wordnet has no syntactic info in it at all, but makes very
fine-grained semantic distinctions. It also has a built-in
ontology (is-a, kind-of, has-a, part-of, etc relations)

A worthy linguistic task these days is "word-sense
disambiguation": assigning dictionary senses to
words in text. For historical/technical reasons,
WordNet tends to be the dictionary used for
academic study.

> I have been messing with Yago (http://www.mpi-inf.mpg.de/yago-
> naga/yago/), so would like to understand more..

I've been fiddling with "learning common sense
knowledge from reading". For example, the MIT
OpenMind/CommmonSense project has a collection
of english sentences like "water is something you
swim in". I can convert these (with link-grammar +
relex + other post processing) into triples such as
"swim_in (water, you)" .. well, bad example, as it begs
the question of "what's you?" ... but you get the
idea. The cannonical YAGO example "Einstein
lived in Germany" is easier.

The triple-extraction is far from perfect .. crude, even,
so the question is how to tighten it up.

One pending idea is to "fact check", so that a
sentence "I heard a bark in the night" would be analyzed
to determine that tree-bark is not a sound, so "bark" is
not "tree bark" in this sentence. I'd like to "fact-check"
the common-sense database against itself (without
succumbing to the temptation to use wordnet as a
crutch), and then moving to more complex tasks
(possibly using things like yago as a crutch ... or
maybe not .. haven't gotten there yet).

--linas

Reply all
Reply to author
Forward
0 new messages