The PDF attachment revises an earlier 7 May 2017 report sent out on this mailing list. It has new graphs, better notation, and, most importantly analyzes a bigger dataset. Some of the more mysterious figures turn out to be gaussians (bell curves)! This is actually quite unexpected, since most other distributions are zipfian. I don't know what kind of network theory results in gaussian distributions. Cosine similarity looks much more promising than whatever I said before.
Unfortunately, I've made little progress since the last report. There's a reason for this. Sometime around May, a critical bug was created, but not caught until late June: the sign on all MI values was reversed. So I was churning out large datasets where the MST parses were the worst-possible parses, instead of the best-possible! Creating large datasets is like watching paint dry. It's pretty mind-numbing. So I lost a month or two with that.
At the same time as this was going on, there was also a different kind of error, not in the data processing, but in the analysis. I was attempting to remove "noise" from the datasets -- as well as cut down the size to make them more manageable. I only recently realized that I was discarding most of the "signal" with the noise. I was cutting down the dataset by removing infrequently-observed disjuncts. An unfortunate side-effect was that this sharply raised cosine similarity between most word pairs -- even grammatically unrelated pairs. Most of the top 800 words had a similarity of greater than 0.7 which was an absurd untenable situation. Between these two errors, it was very hard to see what was going on; it was confusing. Confusion now over, but it took about two months to get past it.
Anyway, this required a redo for the May report to disentangle what's what. The new improved report is attached. The cosine-similarity graph on page 41 is worth a look. Yes, its 48 pages long. A lot of work.