How do you interpret "cost"?

13 views
Skip to first unread message

Mike Dowd

unread,
Jan 11, 2022, 9:57:54 PM1/11/22
to link-grammar
Would you say that a linkage with low cost has a clear and obvious parse?
A simple parse?
A high probability of being correct?

I'm wondering what I get if I limit cost to something small like 0.5 or even 0.

Linas Vepstas

unread,
Jan 11, 2022, 10:13:19 PM1/11/22
to link-grammar
The intent is that the higher the cost, the less likely that the parse
is correct.

The cost appears to behave like (minus) the logarithm of the
probability. In particular, cost is additive (which is how logarithms
behave).

Sometimes, the best parse still has a high cost: that means simply
that there are no parses with lower cost. It does not (necessarily!?)
mean that the parse is bad. There are many good/excellent parses with
a cost greater than 2. Due to the way things work, some parses have a
cost of less than zero. Negative costs are used to break certain ties.

The parser prints parses in cost-sorted order. If two parses have
nearly the same cost, they are equally likely of being correct. Parses
that have a cost that is much larger than the lowest cost are much
less likely of being correct.

-- linas


--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

hak...@gmail.com

unread,
Jan 12, 2022, 2:20:35 AM1/12/22
to link-g...@googlegroups.com
Hello Linas,
Is cost calculation based on Shannon Entropy formula ?


Happy new year :)
  py


--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHrUA35Rqe9Zb7jt5mYHz%3DF6WVEOVtiXmQu3mpvJduo_XVcHcg%40mail.gmail.com.


--
Cordialement
                  py

hak...@gmail.com

unread,
Jan 12, 2022, 2:21:20 AM1/12/22
to link-g...@googlegroups.com
image.png
--
Cordialement
                  py

Mike Dowd

unread,
Jan 12, 2022, 6:48:08 PM1/12/22
to link-grammar
Thank you.

Linas Vepstas

unread,
Jan 13, 2022, 12:54:35 PM1/13/22
to link-grammar
Hi!

It depends.  In the current publicly-available dictionaries, the costs are not computed, they are linguist-assigned numbers to indicate that some parses are less likely than others.  In some private, experimental dictionaries, they are calculated.

The calculated costs are in fact (minus) the mutual information between the words, and the disjunct (the grammatical context of the word). You can think of this context -- the disjunct -- as an n-gram or as a skip-gram. Thi is why LG somewhat resembles the neural-net, deep-learning models. However, LG remains symbolic, and disjuncts allow whole-sentence parses. Neural nets are unable to expose the symbolic relationships between words. Neural nets fail to encode grammar; they only encode grammatica fragments via N-grams, skip-grams.

Consider
linkparser> !disj
Display of disjuncts used turned on.
linkparser> Pierre kicked the ball
Found 4 linkages (4 had no P.P. violations)
Linkage 1, cost vector = (UNUSED=0 DIS= 0.00 LEN=6)

    +-------->WV------->+-----Os-----+
    +--->Wd---+---Ss*s--+      +Ds**c+
    |         |         |      |     |
LEFT-WALL Pierre.m kicked.v-d the ball.s

            LEFT-WALL     0.000  hWd+ hWV+ RW+
             Pierre.m     0.000  Wd- Ss*s+
           kicked.v-d     0.000  S- dWV- O+
                  the     0.000  D+
               ball.s     0.000  Ds**c- Os-
           RIGHT-WALL     0.000  RW-


The disjunct on "kicked" is (S- & dWV- & O+)   If  this word-disjunct pairs (w,d) has been observed N(w,d) times, then the mutual information is

MI(w,d) = log_2  N(w,d) N(*,*) / N(W,*) N (*,d)

where N(*,*) the the total number of observations of any word-disjunct pairs, and N(w,*) is the number of times the word w has been observed with any disjunct. Note that P(w,*) = N(w,*) / N(*,*) is the frequentist probability. 

The MI is sometimes called the "mutual entropy", or, more simply "the entropy". It is the Shannon entropy written in such a way as to make it clear what fraction of the entropy is coming from the relationships of things.

The MI encodes how likely two things are going to occur together. Things that occur together  more often than they occur individually have a large positive MI.  Things that occur randomly, for no particular reason have a low or even negative MI  (yes, MI can go negative)  If you sample about a million pairs, the MI typically ranges from about -6 to about +20, and often has the distribution of a Bell curve (centered near +4)  This is true not just for words and grammars, but also interacting proteins and genes in microbiology.  I guess its a generic property of interacting networks; however, I have yet to find any scientific, mathematical explanation of this.   It is probably a generic property of any "1/f" or "scale-free" network or something like that; fractal, I guess.   But without a detailed derivation, it seems to be an open "science problem". 

Anyway, the most accurate parse seems to be the one with the largest grand-total MI.  (and thus, the "costs" are minus the MI)

-- linas


Reply all
Reply to author
Forward
0 new messages