Ben, Linas,
Let me comment on latest results, given LG-English parses are given as
input for Grammar Learner using Identical Lexical Entries (ILE)
algorithm and compared against the same input LG-English parses - for
Gutenberg Children corpus with direct speech taken off, using only
complete LG-English parses for testing and training.
MWC - Minimum Word Count, so test only on the the sentences where every
word in the sentence occurs given number of times or more.
MSL - Maximum Sentence Length, so test only on the the sentences which
has given number of words or less.
MWC(GT) MSL(GT) PA F1
0 0 61.69% 0.65 - all input sentences are used for test
5 0 100.00% 1.00 - sentences with each word occurring 5+
10 0 100.00% 1.00 - sentences with each word occurring 10+
50 0 100.00% 1.00 - sentences with each word occurring 50+
That is:
1) With words occurring 5 and more times recall=1.0 and precision-1.0;
2) Shorter sentences provide better recall and precision.
0 5 70.06% 0.72 - sentences of 5 words and shorter0 10 66.60% 0.69 - sentences of 10 words and shorter0 15 63.87% 0.67 - sentences of 15 words and shorter0 25 61.69% 0.65 - sentences of 25 words and shorter
Note:
1) Identical Lexical Entries (ILE) algorithm is "over-fitting" in fact,
so there is still way to go being able to learn "generalized grammars";
2) Same kind of experiment is still to be done with MST-Parses and
results are not expected to be that glorious, given what we know about
Pearson correlation between F1-s on different parses ;-)
Definitions of PA and F1 are in the attached paper.
Cheers,
-Anton
--------
*Past Week:*
1. Provided data for GC for ALE and dILEd.
2. Fixed GT to allow parsing sentenses starting with numbers in ULL mode.
3. Ended up with Issue #184, ran several tests for different corpora
with different settings of MWC and MSL:
- Nothing interesting for POC-English;
- CDS seems to be dependent on ratio of number of incompletely parsed
sentences to number of completely parsed sentenses which make up corpus
subset defined by MWC/MSL restriction.
http://langlearn.singularitynet.io/data/aglushchenko_parses/CDS-dILEd-MWC-MSL-2019-04-13/CDS-dILEd-MWC-MSL-2019-04-13-summary.txt
- Much more reliable result is obtained on GC corpus with no direct speech.
http://langlearn.singularitynet.io/data/aglushchenko_parses/GCB-NQ-dILEd-MWC-MSL-2019-04-13/GCB-NQ-dILEd-MWC-MSL-summary.txt
4. Small improvement to pipeline code were made.
*Next week:*
1. Resolve Issue #188
2. Resolve Issue #198
3. Resolve Issue #193
4. Pipeline improvements along the way.
Alexey
1) Identical Lexical Entries (ILE) algorithm is "over-fitting" in fact,
so there is still way to go being able to learn "generalized grammars";
Ben,
Alexey, please provide.I'd be curious to see some examples of the sentences used in *** 5 0 100.00% 1.00 - sentences with each word occurring 5+ 10 0 100.00% 1.00 - sentences with each word occurring 10+ 50 0 100.00% 1.00 - sentences with each word occurring 50+ ***
So if I understand right, you're doing grammar inference here, but using link parses (with the hand-coded English grammar) as data ... right? So it's a test of how well the grammar inference methodology works if one has a rather good set of dependency linkages to work with ...?Yes.
2) To which extent "the best of MST" parses will be worse than what we have above (in progress)
3) If we can get quality of "the best of MST" parses close to that (DNN-MI-lking, etc.)
4) If we can learn grammar in more generalized way (hundreds of rules instead of thousands)
***
Thank you! This is fairly impressive: it says that if the algo heard
a word five or more times, that was sufficient for it to deduce the
correct grammatical form!
***
Yes. What we can see overall is that, with the current algorithms
Anton's team is using: If we have "correct" unlabeled dependency
parses, then we can infer "correct" parts-of-speech and POS-based
grammatical rules... for words that occur often enough (5 times with
current corpus and parameters)
So the problem of unsupervised grammar induction is, in this sense,
reduced to the problem of getting correct-enough unlabeled dependency
parses ...
We are going to repeat the same experiment with MST-Parses during this week.
> On Mon, Apr 22, 2019 at 11:18 PM Anton Kolonin @ Gmail <akol...@gmail.com> wrote:
>>
>>
>> We are going to repeat the same experiment with MST-Parses during this week.
>
>
> The much more interesting experiment is to see what happens when you give it a known percentage of intentionally-bad unlabelled parses. I claim that this step provides natural error-reduction, error-correction, but I don't know how much.
If we assume roughly that "insufficient data" has a similar effect to
"noisy data", then the effect of adding intentionally-bad parses may
be similar to the effect of having insufficient examples of the words
involved... which we already know from Anton's experiments. Accuracy
degrades smoothly but steeply as number of examples decreases below
adequacy.
***
My claim is that this mechanism acts as an "amplifier" and a "noise
filter" -- that it can take low-quality MST parses as input, and
still generate high-quality results. In fact, I make an even
stronger claim: you can throw *really low quality data* at it --
something even worse than MST, and it will still return high-quality
grammars.
This can be explicitly tested now: Take the 100% perfect unlaballed
parses, and artificially introduce 1%, 5%, 10%, 20%, 30%, 40% and 50%
random errors into it. What is the accuracy of the learned grammar? I
claim that you can introduce 30% errors, and still learn a grammar
with greater than 80% accuracy. I claim this, I think it is a very
important point -- a key point - but I cannot prove it.
***
Hmmm. So I am pretty sure you are right given enough data.
However, whether this is true given the magnitudes of data we are now
looking at (Gutenberg Childrens Corpus for example) is less clear to
me
Also the current MST parses are much worse than "30% errors" compared
to correct parses.
So even if what you say is correct, it doesn't
remove the need to improve the MST parses...
But you are right -- this will be an interesting and important set of
experiments to run. Anton, I suggest you add it to the to-do list...
-- Ben
***
Ah, well, hmm. It appears I had misunderstood. I did not realize that
the input was 100% correct but unlaballed parses. In this case,
obtaining 100% accuracy is NOT suprising, its actually just a proof
that the code is reasonably bug-free.
***
It's a proof that the algorithms embodied in this portion of the code
are actually up to the task. Not just a proof that the code is
relatively bug-free, except in a broad sense of "bug" as "algorithm
that doesn't fulfill the intended goals"
***
Such proofs are good to have, but its not theoretically interesting.
***
I think it's theoretically somewhat interesting, because there are a
lot of possible ways to do clustering and grammar rule learning, and
now we know a specific combination of clustering algorithm and grammar
rule learning algorithm that actually works (if the input dependency
parses are good)
Then the approach would be
--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA34BzxwmJMeMLT%2Byd_ih14RE6Y3S86XMPKEtCTG7URQKmA%40mail.gmail.com.
Linas, how would you "weight the disjuncts"?
We know how to weight the words (by frequency), and word pairs (by MI).
But how would you weight the disjuncts?
-Anton
24.04.2019 4:13, Linas Vepstas пишет:
--
You received this message because you are subscribed to the Google Groups "lang-learn" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lang-learn+...@googlegroups.com.
To post to this group, send email to lang-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lang-learn/CAHrUA35datvKktVoaJgQk2fbq36t7a2wvWP3EXJ3Wrwaw8UtcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Ben, Linas, here is full set of results generated by Alexey:
Results update:
MWC(GT) MSL(GT) PA F1 5 2 5 3 5 4 5 5 5 10 5 15 5 25
I just though that "Currently, Identical Lexical Entries (ILE) algorithm builds single-germ/multi-disjunct lexical entires (LE) first, and then aggregates identical ones based on unique combinations of disjuncts" is sufficient.
In meantime, it is in the code:
https://github.com/singnet/language-learning/blob/master/src/grammar_learner/clustering.py#L276Cheers,
-Anton
23.04.2019 16:54, Ben Goertzel пишет:
Hey did my last message show up in spam again? :P
Hi Linas, I am re-reading your emails and updating our TODO issues from some of them.
Not sure about this one:
>Did Deniz Yuret falsify his thesis data? He got better than 80% accuracy; we should too.
I don't recall Deniz Yuret comparing MST-parses to LG-English-grammar-parses.
a) Seemingly "worse than LG-English" "sequential parses" provide seemingly better "LG grammar" - that may be some mistake, so we will have to double-check this.
On Sun, May 5, 2019 at 10:15 PM Anton Kolonin @ Gmail <akol...@gmail.com> wrote:Hi Linas, I am re-reading your emails and updating our TODO issues from some of them.
Not sure about this one:
>Did Deniz Yuret falsify his thesis data? He got better than 80% accuracy; we should too.
I don't recall Deniz Yuret comparing MST-parses to LG-English-grammar-parses.Linas: Where does the > 80% figure come from?This paper of Yuret'scites 53% accuracy compared against "dependency parses derived from dependency-grammar-izing Penn Treebank parses on WSJ text" .... It was written after his PhD thesis. Is there more recent work by Yuret that gives massively better results? If so I haven't seen it.
Spitkovsky's more recent work on unsupervised grammar induction seems to have gotten better statistics than this, but it used radically different methods.
a) Seemingly "worse than LG-English" "sequential parses" provide seemingly better "LG grammar" - that may be some mistake, so we will have to double-check this.
Anton -- Have you looked at the inferred grammar for this case, to see how much sense it makes conceptually?Using sequential parses is basically just using co-occurrence rather than syntactic informationI wonder what would happen if you used *both* the sequential parse *and* some fancier hierarchical parse as inputs to clustering and grammar learning? I.e. don't throw out the information of simple before-and-after co-occurrence, but augment it with information from the statistically inferred dependency parse tree...-- Ben
--
You received this message because you are subscribed to the Google Groups "lang-learn" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lang-learn+...@googlegroups.com.
To post to this group, send email to lang-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lang-learn/CACYTDBeAMobEMwiWUL8xbTRZFsLiJ0gtLQJi%3D4xo60rJyX2y9A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAOp5hNGY_A42Uf72UG0A9mZ7LosFiNrioaLyk3J%3DaKRDx1tW-Q%40mail.gmail.com.
<Screenshot_20190507-031931.jpg>
Anton, sequential and random parses are in D56 and D57. Or do you
want specifically the ones for GS and SS? If so, please tell me
where you want them to avoid messing with your file structure,
please.
Yes, the mix of distance and MI is what we have been doing when we
use the distance weighting in MST parsing. But as I noticed
before, we should find a good tuning for each case, because the
MI's vary about two orders of magnitude.
a.
Andres, can you upload the sequential parses that you have evaluated and provide them in the comments to the cells?
Ben, I think the 0.67-0.72 corresponds to naive impression that 2/3-3/4 of word-to-word connections in English is "sequential" and the rest is not. For Russian and Portuguese, it would be somewhat less, I guess.
What you suggest here ("used *both* the sequential parse *and* some fancier hierarchical parse as inputs to clustering and grammar learning? I.e. don't throw out the information of simple before-and-after co-occurrence, but augment it with information from the statistically inferred dependency parse tree") can be simply (I guess) implemented in existing MST-Parser given the changes that Andres and Claudia have done year ago.
That could be tried with "distance_vs_MI" blending parameter in the MST-Parser code which accounts for word-to-word distance. So that if the distance_vs_MI=1.0 we would get "sequential parses", distance_vs_MI=0.0 would produce "Pure MST-Parses", distance_vs_MI=0.7 would provide "English parses", distance_vs_MI=0.5 would provide "Russian parses", does it make sense, Andres?
Ben, do you want let Andres to try this - get parses with different distance_vs_MI in range 0.0-1.0 an see what happens?
This could be tried both ways using traditional MI or DNN-MI, BTW.
Cheers,
-Anton
06.05.2019 12:30, Ben Goertzel :
-- -Anton Kolonin skype: akolonin cell: +79139250058
Hi Linas, I am re-reading your emails and updating our TODO issues from some of them.
Not sure about this one:
>Did Deniz Yuret falsify his thesis data? He got better than 80% accuracy; we should too.
I don't recall Deniz Yuret comparing MST-parses to LG-English-grammar-parses.
>Actually, one of my proposals from the previous block of emails was to make MST worse! I'm so sick of hearing about MST that I proposed getting rid of it, and replacing it with something of lower-quality, and focus on the clustering and disjunct weighting schemes to improve accuracy.
What do you mean by "something"?
b) We have been studying the Pearson(parses,grammar) for MWC=1, and it may happen so that MWC>1 will change the pattern,
>I'm fairly certain that replacing MST with something lower-quality will still work well. If that is not the case, then that means that the disjunct-processing stages are somehow being done wrong. The final result should not depend very much on the accuracy of MST. And this does not require a huge corpus, either. If there is a strong dependence on MST, something is seriously wrong, seriously broken in the disjunct-processing stages. We need to spend energy on fixing that brokenness and not on making MST better.
Again, if you "get rid of MST", where do you get the disjuncts? Just use "all possible combinations of high-MI word pairs in a sentence"?
Does this issue seem like a spec for that?
The explanation, link to the current code and suggestion to improve are here:
The definitions that we use based on your "Sheaves" work can be found here:
On Sun, May 5, 2019 at 10:15 PM Anton Kolonin @ Gmail <akol...@gmail.com> wrote:Hi Linas, I am re-reading your emails and updating our TODO issues from some of them.
Not sure about this one:
>Did Deniz Yuret falsify his thesis data? He got better than 80% accuracy; we should too.
I don't recall Deniz Yuret comparing MST-parses to LG-English-grammar-parses.Linas: Where does the > 80% figure come from?
It turns out the difference on if we apply MWC for GL and GT both (lower block) or for GT only (upper block) is miserable - applying it for GL make results 1% better.
So far, testing on full LG-English parses (including partially parsed) as a reference:
As we know, MWC=2 is much better than MWC=1 and no improvement further.
"Sequential parses" rock, MST and "random" parses suck.
Pearson(parses,grammar) = 1.0
Alexey is running this with "silver standard" for MWC=1,2,3,4,5,10
-Anton
--
You received this message because you are subscribed to the Google Groups "lang-learn" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lang-learn+...@googlegroups.com.
To post to this group, send email to lang-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lang-learn/4dfac49f-a6b5-f5ab-6fb0-d0be96ee77ef%40gmail.com.
For more options, visit https://groups.google.com/d/optout.
CAUTION: *** parses in the folder with dict files are not the inputs, but outputs - they are produced on basis of the grammar in the same folder, I am listing the input parses below !!! ***
- row 63, learned NOT from parses produced by DNN, BUT from honest MST-Parses, however MI-values for that were extracted from DNN and made specific to context of every sentence, so each pair of words could have different MI-values in different sentences:
exported in new "ull" format invented by Man Hin:
Regarding what you call "the breakthroughs":
>Results from the ull-lgeng dataset indicates that the ULL pipeline is a high- fidelity transducer of grammars. The grammar that is pushed in is the effec- tively the same as the grammar that falls out. If this can be reproduced for other grammars, e.g. Stanford, McParseface or some HPSG grammar, then one has a reliable way of tuning the pipeline. After it is tuned to maximize fidelity on known grammars, then, when applied to unknown grammars, it can be assumed to be working correctly, so that whatever comes out must in fact be correct.
That has been worked accordingly to the plan set up way back in 2017. I am glad that you accept the results. Unfortunately, the MST-Parser is not built-in into pipeline yet but is is on the way.
If one like you could help with the outstanding work items, it would be appreciated, because we are short-handed now.
>The relative lack of differences between the ull-dnn-mi and the ull-sequential datasets suggests that the accuracy of the so-called “MST parse” is relatively unimportant. Any parse, giving any results with better-than-random outputs can be used to feed the pipeline. What matters is that a lot of observation counts need to be accumulated so that junky parses cancel each-other out, on average, while good ones add up and occur with high frequency. That is, if you want a good signal, then integrate long enough that the noise cancels out.I would disagree (and I guess Ben may disagree as well) given the existing evidence with "full reference corpus".
If you compare F1 for LG-English parses with MST > 2 on tab "MWC-Study" you will find the F1 on LG-English parses is decent, so it is not that "parses do not matter", it is rather just "MST-Parses are even less accurate that sequential".
Still, we have got "surprize-surprize" with "gold reference corpus". Note, it still says "parses do matter but MST-Parses are as bad or as good as sequential but both are still not good enough". Also note, that it has been obtained just on 4 sentences which is not reliable evidence.
Now, we are full-throttle working on proving your claim now with "silver reference corpus" - stay tuned...
Cheers,
-Anton
22.06.2019 5:38, Linas Vepstas:
-- -Anton Kolonin skype: akolonin cell: +79139250058
Hi,
I think everyone understands that
***
The third claim, "the Linas claim", that you love to reject, is that
"when ULL is given a non-lexical input, it will converge to the SAME
lexical output, provided that your sampling size is large enough".
***
but it's not clear for what cases feasible-sized corpora are "large enough" ...
To view this discussion on the web visit https://groups.google.com/d/msgid/lang-learn/CACYTDBfCqqYbq%3DptiRboyCEv51sHwX2rH_j08pu_7wKsvxT9Mg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.