Parsey McParseface and Google SyntaxNet

Jeremy Zucker

unread,

May 13, 2016, 1:38:39 PM5/13/16

to ope...@googlegroups.com

Hi folks,

I would imagine that Google's release of SyntaxNet in TensorFlow will provide a valuable boost in OpenCog Natural Languge Understanding efforts. Although SyntaxNet cannot resolve common sense Winograd-style tests, it frees up efforts like OpenCog to work on those kinds of problems rather than the kinds of things that SyntaxNet handles.

http://googleresearch.blogspot.com/2016/05/announcing-syntaxnet-worlds-most.html?utm_campaign=Data%2BElixir&utm_medium=email&utm_source=Data_Elixir_79

Sincerely,

Jeremy

"Any sufficiently complex system is hackable"
http://www.linkedin.com/profile/view?id=4389790

Ben Goertzel

unread,

May 14, 2016, 2:55:37 AM5/14/16

to opencog

Parsey McParseface aka SyntaxNet does indeed look like a valuable resource!

I would clarify that the OpenCog project has not really put in any
significant effort on syntax parsing, ever.... Linas of course has
personally put a lot of time and energy into the link parser, and
Ruiting and Rodas spent a couple months improving the link parser's
handling of comparatives a few years ago... RelEx (which
postprocesses link parses to make more abstract semantic dependency
relations) was built in 2004 or so for a Novamente LLC commercial
contract and then put into OpenCog...

So it's not like SyntaxNet will eliminate any work we're doing or have
been doing.... However, it may be worth evaluating at some point
whether to include SyntaxNet as an option in OpenCog alongside the
link parser...

I feel the link parser exposes the underlying mathematical structure
of language in an interesting, AI-relevant way that SyntaxNet and
other similar dependencey parsers do not (at least not on the
surface). OTOH there are fairly straightforward mathematical
conversions btw link grammar type formats and SyntaxNet type formats,
also...

What Linas, Ruiting and I have been muttering about for a long time is
using unsupervised learning to drive concurrent learning of syntax and
semantics (and recently, as we've been playing a lot with robots,
we've been drifting toward the idea of doing this in a way that
incorporates "embodied" data like vision, audition, movement,
etc.).... As you note, SyntaxNet doesn't do that. But it might
prove a valuable resource encoding "prior distributions" on the syntax
aspect, which could be useful for nudging this sort of learning in the
right direction. Lots to think about and explore...

-- Ben

> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CA%2BHW2JOL90RHRf4VHeiX_ZbPghg%2B5i7VTLCabjyBknfC%2BBJDpQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

--
Ben Goertzel, PhD
http://goertzel.org

"When in the body of a donkey, enjoy the taste of grass." -- Tibetan saying

Jedi Knight

unread,

Jun 28, 2016, 5:38:55 PM6/28/16

to opencog

why is no SyntaxNet google group?

Linas Vepstas

unread,

Jun 28, 2016, 10:47:48 PM6/28/16

to opencog, link-grammar, Word Grammar

I like it .. it seems to be done well.

There are issues, though: -- its English language only -- its unclear how it handles questions -- its unclear how it handles speech acts that aren't grammatical -- its not clear how it handles pauses, fumblings, hemming and hawing, shouting, whispering -- It assumes that everything is a dependency tree -- it does not include any way to provide feedback from the reasoning and logic layers back to the parsing layer.

Well .. the same is true for link-grammar, also, at this time, but LG has a clear path on how to handle much or most of the above. It took me a while to understand that there is a lot more to parsing than simply creating dependency trees -- if that was all that there is to it, then most parsers would be "good enough" with any accuracy over, say 90% on some "typical" corpus. But accuracy is not really the issue -- that is not where the problem lies.

A large part of the problem is that tweets aren't grammatical, but people still understand them. People abuse "fixed phrases" and "insititutional utterances" all the time, creating new language in that way: "Just do it" "Where's the beef?" "Mission accomplished". Synonymous phrases appear and disappear all the time -- I think Boing Boing had a good exploration of Archie comics from 60 years ago, and pointed out how much that language has changed since then -- teens just don't talk like that any more.

English has a way of obscuring some of these issues, because it encodes a lot of meaning in the word order, as opposed to the morphology, and so English speakers are a lot less aware of using institutional phrases, and are less aware of how important they are for semantics, and how much they control the change and usage of language.

The upshot is that dependency parsing is really just the tip of the iceberg. The good stuff, the interesting stuff lies elsewhere.

Anyway, ... I almost wrote that it would not be hard to slot SyntaxNet in place of Relex. But that would not be quite correct. First of all, relex is inadequate for some of the things that opencog needs, and so opencog falls back to link-grammar in those areas where relex is broken. Another major issue would be how questions get handled -- relex does some funky stuff to try to replace the object of a question by a variable, and R2L uses that.

Well, and even if you got past that, its not clear that R2L is doing the right things, since its again a rule-driven system, running on hand-written rules, when it should really be learning rules, via some machine learning mechanism. ... but all that needs to feed back into the parse itself...

So perhaps syntaxnet is a step forward, but without a holistic approach, I don't see much utility for it.

I can also put it a different way: prior to SyntaxNet, I believe that Link Grammar was the most accurate English language parser out there, and the amount of attention that generated was just miniscule. There is very little demand for parsers, and basically, nobody quite knows what to do with them.

For example: with LG, people come by, they try it out, ask a few questions, and go their own way. The information that a parser provides is just not relevant to any software and machine-learning folks. The knowledge-extraction crowd pays lip service, but then doesn't actually use it as a tool. There's no real community, because there are no real applications. No one knows what to do with a parser.