Hi!
It depends. In the current publicly-available dictionaries, the costs are not computed, they are linguist-assigned numbers to indicate that some parses are less likely than others. In some private, experimental dictionaries, they are calculated.
The calculated costs are in fact (minus) the mutual information between the words, and the disjunct (the grammatical context of the word). You can think of this context -- the disjunct -- as an n-gram or as a skip-gram. Thi is why LG somewhat resembles the neural-net, deep-learning models. However, LG remains symbolic, and disjuncts allow whole-sentence parses. Neural nets are unable to expose the symbolic relationships between words. Neural nets fail to encode grammar; they only encode grammatica fragments via N-grams, skip-grams.
Consider
linkparser> !disj
Display of disjuncts used turned on.
linkparser> Pierre kicked the ball
Found 4 linkages (4 had no P.P. violations)
Linkage 1, cost vector = (UNUSED=0 DIS= 0.00 LEN=6)
+-------->WV------->+-----Os-----+
+--->Wd---+---Ss*s--+ +Ds**c+
| | | | |
LEFT-WALL Pierre.m kicked.v-d the ball.s
LEFT-WALL 0.000 hWd+ hWV+ RW+
Pierre.m 0.000 Wd- Ss*s+
kicked.v-d 0.000 S- dWV- O+
the 0.000 D+
ball.s 0.000 Ds**c- Os-
RIGHT-WALL 0.000 RW-The disjunct on "kicked" is (S- & dWV- & O+) If this word-disjunct pairs (w,d) has been observed N(w,d) times, then the mutual information is
MI(w,d) = log_2 N(w,d) N(*,*) / N(W,*) N (*,d)
where N(*,*) the the total number of observations of any word-disjunct pairs, and N(w,*) is the number of times the word w has been observed with any disjunct. Note that P(w,*) = N(w,*) / N(*,*) is the frequentist probability.
The MI is sometimes called the "mutual entropy", or, more simply "the entropy". It is the Shannon entropy written in such a way as to make it clear what fraction of the entropy is coming from the relationships of things.
The MI encodes how likely two things are going to occur together. Things that occur together more often than they occur individually have a large positive MI. Things that occur randomly, for no particular reason have a low or even negative MI (yes, MI can go negative) If you sample about a million pairs, the MI typically ranges from about -6 to about +20, and often has the distribution of a Bell curve (centered near +4) This is true not just for words and grammars, but also interacting proteins and genes in microbiology. I guess its a generic property of interacting networks; however, I have yet to find any scientific, mathematical explanation of this. It is probably a generic property of any "1/f" or "scale-free" network or something like that; fractal, I guess. But without a detailed derivation, it seems to be an open "science problem".
Anyway, the most accurate parse seems to be the one with the largest grand-total MI. (and thus, the "costs" are minus the MI)
-- linas