Revised report on connector sets.

42 views
Skip to first unread message

Linas Vepstas

unread,
Aug 7, 2017, 1:19:19 AM8/7/17
to link-grammar, opencog, Ben Goertzel
The PDF attachment revises an earlier 7 May 2017 report sent out on this mailing list. It has new graphs, better notation, and, most importantly analyzes a bigger dataset.  Some of the more mysterious figures turn out to be gaussians (bell curves)! This is actually quite unexpected, since most other distributions are zipfian.  I don't know what kind of network theory results in gaussian distributions. Cosine similarity looks much more promising than whatever I said before.

Unfortunately, I've made little progress since the last report. There's a reason for this. Sometime around May, a critical bug was created, but not caught until late June: the sign on all MI values was reversed. So I was churning out large datasets where the MST parses were the worst-possible parses, instead of the best-possible! Creating large datasets is like watching paint dry. It's pretty mind-numbing. So I lost a month or two with that.

At the same time as this was going on, there was also a different kind of error, not in the data processing, but in the analysis. I was attempting to remove "noise" from the datasets -- as well as cut down the size to make them more manageable. I only recently realized that I was discarding most of the "signal" with the noise. I was cutting down the dataset by removing infrequently-observed disjuncts. An unfortunate side-effect was that this sharply raised cosine similarity between most word pairs -- even grammatically unrelated pairs. Most of the top 800 words had a similarity of greater than 0.7 which was an absurd untenable situation. Between these two errors, it was very hard to see what was going on; it was confusing.  Confusion now over, but it took about two months to get past it.

Anyway, this required a redo for the May report to disentangle what's what. The new improved report is attached.  The cosine-similarity graph on page 41 is worth a look. Yes, its 48 pages long. A lot of work.

--linas


connector-sets-revised.pdf

Ben Goertzel

unread,
Aug 7, 2017, 3:00:42 AM8/7/17
to link-grammar, opencog
Thanks Linas, I have downloaded the report and will read it carefully
as time permits!

A fascinating though arduous journey...
> --
> You received this message because you are subscribed to the Google Groups
> "link-grammar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to link-grammar...@googlegroups.com.
> To post to this group, send email to link-g...@googlegroups.com.
> Visit this group at https://groups.google.com/group/link-grammar.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/link-grammar/CAHrUA36kvyD--0aT0yPgWRtWsz6Pq4LGML_yPn1OyXbVj-w6Hg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin
Reply all
Reply to author
Forward
0 new messages