Chinese, English databases

93 views
Skip to first unread message

Linas Vepstas

unread,
Jul 8, 2017, 9:40:47 PM7/8/17
to Ben Goertzel, opencog, Ruiting Lian, link-grammar
Ben,

Attached is a short report on some (not all) of the databases I have.  I'll try to push out a report on Chinese MST parses later today or tomorrow.

There's a big part that's still missing/unreported for English: how the MST parses compare to the disjunct parses. That requires more work, and might take a few more days. It all depends.

--linas
hanzi.pdf

Alex

unread,
Jul 9, 2017, 7:15:14 AM7/9/17
to opencog
Probably offtopic - while I am reading about OpenCog community efforts in NLP, I am quite suspicious about statistical methods. I think that the only meaningful approach to the NLP ir the combinatory categorial grammars (Lambek calculus, Montague semantics) and this effort tries to translate natural language sentences into logical expressions - lambda calculus expressions. So - if there is connection between Schema as a language of lambda calculus, then CCGs are the way of translating NL sentences directly into Scheme structures. Besides CCGs approach uses white box approach and understanding for the semantics of natural language, these semantical knowledge can also be encoded as the Scheme/OpenCog structures and can be learned of enhanced by time.

Of course, raw statistical approach in the end can give the same results, but structured approach can be more feasible. Besides - statistical approach yields results that are worth all or nothing. But CCG approach yields results that are improving step by step and such improving understanding reflects the human approach to the world and language - humans progresively learns language, its syntax and semantics. I we have the slightest doubts about existence of the perfect understanding of the language then we should also must have doubts about efficiency of the statistical approach.

Ben Goertzel

unread,
Jul 9, 2017, 7:21:30 AM7/9/17
to opencog
CCG is fairly straightforwardly mappable into link grammar; and our
"statistical" approach is based on using unsupervised pattern mining
(in a particular complex way) to induce a link grammar for a
language...

So we (i.e. Linas and I especially) basically agree with CCG as a
formalism, but we want learned not hand-coded grammar rules...

Also, the hand-coded link grammar dictionary is simply much bigger
and better than any existing CCG dictionary.... Someone could write a
script to transform it into a (complicated and ugly) CCG if they
wanted...

-- Ben G
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/393b88c8-aadd-456c-bd84-eaac92b55fd8%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

Linas Vepstas

unread,
Jul 9, 2017, 2:23:44 PM7/9/17
to opencog
Link Grammar is a certain kind of CCG. -- linas

On Sun, Jul 9, 2017 at 6:15 AM, Alex <alexand...@gmail.com> wrote:
Probably offtopic - while I am reading about OpenCog community efforts in NLP, I am quite suspicious about statistical methods. I think that the only meaningful approach to the NLP ir the combinatory categorial grammars (Lambek calculus, Montague semantics) and this effort tries to translate natural language sentences into logical expressions - lambda calculus expressions. So - if there is connection between Schema as a language of lambda calculus, then CCGs are the way of translating NL sentences directly into Scheme structures. Besides CCGs approach uses white box approach and understanding for the semantics of natural language, these semantical knowledge can also be encoded as the Scheme/OpenCog structures and can be learned of enhanced by time.

Of course, raw statistical approach in the end can give the same results, but structured approach can be more feasible. Besides - statistical approach yields results that are worth all or nothing. But CCG approach yields results that are improving step by step and such improving understanding reflects the human approach to the world and language - humans progresively learns language, its syntax and semantics. I we have the slightest doubts about existence of the perfect understanding of the language then we should also must have doubts about efficiency of the statistical approach.

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.

Linas Vepstas

unread,
Jul 9, 2017, 2:44:58 PM7/9/17
to opencog, link-grammar, Alex, Bob Coecke
My answer was too brief. Link-grammar is a kind of CCG, where the morphisms are are given streamlined, less klunky labels, called "types". A branch of mathematics called "type theory" explains how types and categories are dual to each other, and how talking about types often provides a simpler, deeper insight than talking about categories. But they are the "same thing" in a certain sense.

More precisely, categories have an "internal hom" functor, and that hom defines the "internal language" of the category. The most famous example is that the "internal language" of closed cartesian categories is simply-typed lambda calculus. This can be seen as a form of Curry-Howard correspondence. John Baez and Bob Coecke have written extensively on this topic; before them was Lambeck and many others.

Roughly speaking, types describe how morphisms can be composed; they describe allowed combinations.

I think I wrote previously about how I believe that we can beat other research: by using what are called "sections of a sheaf" in mathematics, and are called "disjuncts" in link grammar.  Here's why:

Pretty much all work on meaning uses vector spaces (e.g. word2vec, and so on) and vector spaces are a certain kind of category with a certain kind of internal language, that we know, a priori, is not quite right for the categories that describe language. To overcome this limitation, I believe that by using sections of a sheaf (i.e. link-grammar disjuncts) we can locally "glue together" the necessary semantic space, in the correct shape, instead of assuming that it is flat, which is what vector spaces do.

Of course, I am just throwing around big words here. The actual work is harder, and I just spent the last month creating a dataset which turns out to have a deep, maybe fatal flaw. Its a lot of work to take a small step.

--linas
  

Linas Vepstas

unread,
Jul 9, 2017, 3:34:41 PM7/9/17
to opencog, link-grammar, Alex
What Ben is saying here is that essentially all grammars: (Head-)phrase structure grammars (HPSG), dependency grammars (DG) and the CCG's are known to be mathematically isomorphic to one-another, with "well-known" algorithms that convert from one to the other.

That is .. if the grammars have been expressed in a sufficiently mathematical fashion for an algorithm to be developed for them. Most papers in linguistics are simply not that precise, for various historical and cultural reasons.

There's also the "devil in the details" Just because you can translate an HPSG into a DG or a CCG, does not mean that the kinds of behaviors and relationships discussed in one system can be trivially converted into factual statements in another system.

The focus of attention in these different systems is different, and I don't know of any (detailed) work which attempts to actually probe the isomorphism between these different systems, and how statements made in one system correspond to statements made in other systems.  I think that this kind of filling-in-the-details will be necessary, and that, without it, the various different grammar camps will continue to engage in dogmatic warfare.

--linas



> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/393b88c8-aadd-456c-bd84-eaac92b55fd8%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin
--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.

To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

Linas Vepstas

unread,
Jul 9, 2017, 11:50:32 PM7/9/17
to Bob Coecke, opencog, link-grammar, Alex
Hi Bob,

Thanks for the response. Some later date, I would like to talk more. Yes, its the non-Cartesian-ness of it all that I think I now know how to handle.  Very briefly: in the original papers on link-grammar (1991-1993), they explained it by drawing pictures of jigsaw puzzle-pieces.  One of the pop-sci reports of your work has a diagram of ... jigsaw puzzle pieces.  I slapped my forehead. The current realization is that the "jigsaw pieces" are exactly the same thing as the local sections of a sheaf, and so I am busy data-mining those.

As I just discovered a terrible terrible bug in my code (dropped minus sign) earlier today, a month or two of data collection is wiped out. :-( That's life.  Back to work, and slightly less conversation, for me, just right now.

--linas



On Sun, Jul 9, 2017 at 9:44 PM, Bob Coecke <bob.c...@cs.ox.ac.uk> wrote:
Dear Linas, thanks for including me here.  One crucial thing about language is that it manifestly “non-Cartesian”, and may as well be the total opposite in the spectrum of compositional structures, if one follows Lambek in the 2000s.  The upshot of this is that it gives language a leading role in a spectrum of theories across many disciplines where these "anti-Cartesian” structures rule  [ :) ].  I have written a couple of pedestrian/popular papers about this:

From quantum foundations via natural language meaning to a theory of everything
https://arxiv.org/abs/1602.07618

An alternative Gospel of structure: order, composition, processes
https://arxiv.org/abs/1307.4038

and my recent book with Aleks Kissinger, although in 1st order quantum theory, provides a framework on compositionally that applies equally well to language (as we explain in some advanced material sections):

http://www.cambridge.org/pqp

Linas Vepstas

unread,
Jul 10, 2017, 2:20:23 AM7/10/17
to Bob Coecke, opencog, link-grammar, Alex
On Sun, Jul 9, 2017 at 11:14 PM, Bob Coecke <bob.c...@cs.ox.ac.uk> wrote:
Thanks for the response. Some later date, I would like to talk more. Yes, its the non-Cartesian-ness of it all that I think I now know how to handle.  Very briefly: in the original papers on link-grammar (1991-1993), they explained it by drawing pictures of jigsaw puzzle-pieces.  One of the pop-sci reports of your work has a diagram of ... jigsaw puzzle pieces.  I slapped my forehead. The current realization is that the "jigsaw pieces" are exactly the same thing as the local sections of a sheaf, and so I am busy data-mining those.

Interesting, need to understand more there...

On the one hand, I feel like I'm giving away my best secret. On the other hand, its "obvious", and you will recognize it immediately.

I'm repeating again, because I really want the other readers on the mailing list to grok the ideas.

By example/analogy:

A common, almost canonical way to describe a graph is to list all of the vertexes, and all of the edges. This gives a "global" description of the graph. You get all of it, in one big gulp.

Another way to describe it, uncommon but perfectly valid, is to list the vertexes, and, associated to each vertex, a set of edges coming off of it.  (it is convenient, sometimes, to describe "half of an edge", to be used for this purpose).

Each pair (vertex, {set of edges originating/terminating on that vertex}) is a local section of the graph. Glue these together, you get the whole graph.

In link-grammar, these pairs are called "disjuncts" and are the basic entries in the LG dictionary.

In the puzzle-piece analogy, these pairs are puzzle-pieces, and snapping them together in a legal fashion is the act of parsing a sentence.

Each "tab" on a jigsaw puzzle piece is called a "connector" in LG.

In CCG, each "connector" has (for example) the form of NP\S or S/NP\VP and so on.  In a pregroup grammar, I guess you would write it as "sat cat_L^-1 mat_R^-1" or something like that (for the verb in "the cat sat on the mat").  In LG, these rather verbose expressions are replaced by a simple label, called the "link type". As it happens, the link type is more-or-less the same thing as a type in type theory.

So, really, for language, I should write (word, {partially-ordered set of connectors}) instead of (vertex, {set of edges originating/terminating on that vertex})

Anyway, each pair (vertex, {set of edges originating/terminating on that vertex}) can be recognized as a section, in the sense of algebraic topology, where you have all this machinery for when you can glue them together, & etc. In sheaves, I guess you can call them sections, and a dictionary entry would then be a germ or a stalk, or thereabouts. Although this analogy/terminology holds, its perhaps a bit over-blown, as I have not yet seen any reason to push hard on it; I have not yet had any insight that would require any of the fancier machinery of algebraic topology.  So its a casual observation.

Similarly, you could call this (vertex, {set of edges originating/terminating on that vertex}) thingy a "diagram" in category theory, but that isn't quite the suitable analogy. It could be forced into working but would be clunky.  (so e.g. verbs are kind-of-ish like spans but that's goofy because there's no obvious limit to go with that. or no obvious reason to even invoke to the concept of a limit. So why am I doing so now? I dunno.)

So, roughly that's the correspondence and the general idea.

--linas


Alex

unread,
Jul 31, 2017, 3:58:50 PM7/31/17
to opencog
Just for reference, there is down-to-earth, but very interesting and valuable article about generating CCG lexicon from Link Grammar lexicon https://hal.archives-ouvertes.fr/hal-00487053/document . So, now, as a fan of CCG, I will start to value efforts in link grammar area.

Linas Vepstas

unread,
Jul 31, 2017, 4:35:45 PM7/31/17
to opencog, link-grammar
yeah, I guess I should read that paper. Not having read it, I'll stick to my original assertion: link-grammars are categorial grammers are more or less the same thing, just using a different notation. 

There are, however, a few tricks we've learned over the years:

-- proposition 3 deals with cycle-free linkages, and its now quite clear that cycles are strongly desirable for constraining the number of linkages, so any grammar that is cycle free is going to loose.  Section 4.1.3 notes this.

-- non-planar cycles are probably going to be particularly important, but that research has not yet been done aka "landmark transitivity"

-- assigning liklihoods is my current direction of research. Its where the rubber hits the road. Its what the word-vector people work on, and I think that LG/CCG offer superior approaches to  flat vectors.

-- repulsivity or mutual repulsion/exclusion is important for understanding. Given two similar, nearly identical parses, the correct one is the one that either aligns the most closely with previous parses, or the one that is most strongly different from previous parses. The middle ground must always be ceeded as being wrong.  This is not a philosophical statement, rather, it is a statement about how human minds actually work, when working with meaning.

Example: if you are looking at a bird dropping on the sidewalk, and someone says "look over here", they probably do NOT mean "look at this ladybug crawling on this leaf". They are directing attention at something very different, rather than at something else that is similar but narrow.

Linas

On Mon, Jul 31, 2017 at 2:58 PM, Alex <alexand...@gmail.com> wrote:
Just for reference, there is down-to-earth, but very interesting and valuable article about generating CCG lexicon from Link Grammar lexicon https://hal.archives-ouvertes.fr/hal-00487053/document . So, now, as a fan of CCG, I will start to value efforts in link grammar area.

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
Reply all
Reply to author
Forward
0 new messages